Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 171
Posts: 171   Pages: 18   [ Previous Page | 8 9 10 11 12 13 14 15 16 17 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 24933 times and has 170 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

Batch 58 is showing the same odd checkpointing behaviour. This unit has reached 12.5% progress, properties show no checkpoint, checkpoint_debug flag is on now but no such message from this task in the event log yet, but stderr.txt gives this at the end (so far):

Setting up checkpointing ...
BOINC:: Worker startup.
Starting job S_0001
Finished job S_0001 in 615.359 seconds
Starting job S_0002

This is BETA_ beta26_ 00000058_ 0376_ 0--

"Checkpoint at most every" is still set to 300 sec.

At 20% progress, properties gives a checkpoint time, event log has this:

09/08/2017 20:36:10 | World Community Grid | [checkpoint] result BETA_beta26_00000058_0376_0 checkpointed

and stderr has this appended:

Finished job S_0002 in 481.844 seconds
Starting job S_0003

At this stage, boinc_checkpoint_count.txt appeared with content "1".

It looks like structure 1 is not triggering a checkpoint (or at least no evidence of one).
[Aug 9, 2017 7:43:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

Yesterday my WinXP box got some beta WUs, but the runtime had been updated so they failed to load. I patched the runtime to set the subsystem flags to 5 instead of 6 and waited. A few hours ago I got some more, and they ran clean and validated.

Please patch the runtime before release so that XP users don't have to know how to patch things themselves.
[Aug 9, 2017 8:32:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

After completion of 9 out of the 10 structures in BETA_ beta26_ 00000058_ 0376_ 0, the boinc_checkpoint_count.txt was 8 and checkpoint_debug gave 8 instances of a checkpoint. It does seem that just structure 1 is failing to trigger a checkpoint. A variable being initialised wrongly or not at all? confused

Another question - does the final structure trigger a checkpoint, i.e. just before workunit completion?
Observation - BETA_ beta26_ 00000058_ 0376_ 0 did not show a checkpoint in the event log just before completion, but a different unit BETA_ beta26_ 00000058_ 0099_ 0-- (that finished seconds before 0376) did show a checkpoint just before completion:

09/08/2017 21:47:35 | World Community Grid | [checkpoint] result BETA_beta26_00000058_0099_0 checkpointed
09/08/2017 21:47:37 | World Community Grid | Message from task: 0
09/08/2017 21:47:38 | World Community Grid | Computation for task BETA_beta26_00000058_0099_0 finished

0099 showed a total of 9 checkpoints, also missing that after structure 1.
[Aug 9, 2017 9:03:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Skivelitis2
Advanced Cruncher
USA
Joined: Mar 21, 2015
Post Count: 113
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

Yesterday my WinXP box got some beta WUs, but the runtime had been updated so they failed to load. I patched the runtime to set the subsystem flags to 5 instead of 6 and waited. A few hours ago I got some more, and they ran clean and validated.

Please patch the runtime before release so that XP users don't have to know how to patch things themselves.

+1
Only unit from current batch on Win XP 32 failed. Four on LM 18.2 so far so good, 3 complete and validated,1 in progress.
----------------------------------------

[Aug 9, 2017 10:02:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Seoulpowergrid
Veteran Cruncher
Joined: Apr 12, 2013
Post Count: 818
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

Unsure if it is useful as it wasn't on my machine, but a beta WUs that was able to download and run/complete on my machine (Mac) but failed on a Win and a linux.

BETA_beta26_00000056_1204
Opening Workunit Status and clicking Error for the Linux gets me this:
<core_client_version>7.6.31</core_client_version>
<![CDATA[
<message>
app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>wcgrid_beta26_rosetta_7.10_x86_64-pc-linux-gnu</file_name>
<error_code>-224 (permanent HTTP error)</error_code>
<error_message>permanent HTTP error</error_message>
</file_xfer_error>
<file_xfer_error>
<file_name>beta26_image03_7.10.tga</file_name>
<error_code>-224 (permanent HTTP error)</error_code>
<error_message>permanent HTTP error</error_message>
</file_xfer_error>
<file_xfer_error>
<file_name>beta26_image04_7.10.tga</file_name>
<error_code>-224 (permanent HTTP error)</error_code>
<error_message>permanent HTTP error</error_message>
</file_xfer_error>
<file_xfer_error>
<file_name>beta26_image05_7.10.tga</file_name>
<error_code>-224 (permanent HTTP error)</error_code>
<error_message>permanent HTTP error</error_message>
</file_xfer_error>

</message>
]]>

And the same procedure for the Microsoft Windows 8.1 gets me:
<core_client_version>7.2.47</core_client_version>
<![CDATA[
<message>
app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>wcgrid_beta26_rosetta_7.10_windows_intelx86</file_name>
<error_code>-224 (permanent HTTP error)</error_code>
<error_message>permanent HTTP error</error_message>
</file_xfer_error>
<file_xfer_error>
<file_name>wcgrid_beta26_gfx_7.10_windows_intelx86</file_name>
<error_code>-224 (permanent HTTP error)</error_code>
<error_message>permanent HTTP error</error_message>
</file_xfer_error>

</message>
]]>


Same deal with
BETA_beta26_00000058_0694
The windows box looks like the same client (OS Version is the same at Enterprise x64 Edition, (06.03.9600.00)) and the contents of the Error is the same.
----------------------------------------

[Aug 10, 2017 2:05:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1323
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

All the beta's so far on Windows are Valid.

On Linux64 (Device 2078465) however it seems to be batch dependent:
Batch 57: 14 errors got signal 11, no valids
Batch 58: 14 Valids 0 errors
Batch 59: 10 errors got signal 11, no valids
Batch 60: 10 errors got signal 11, no valids
Batch 61: 4 in progress and running fine
[Aug 10, 2017 7:51:00 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Doublec
Advanced Cruncher
France
Joined: Aug 25, 2006
Post Count: 58
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

On Windows Pro 8.1 (64 bits) :

Nom du résultat: BETA_ beta26_ 00000058_ 1952_ 0--
<core_client_version>7.6.33</core_client_version>
<![CDATA[
<message>
app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>wcgrid_beta26_rosetta_7.10_windows_intelx86</file_name>
<error_code>-224 (permanent HTTP error)</error_code>
<error_message>permanent HTTP error</error_message>
</file_xfer_error>

</message>
]]>
[Aug 10, 2017 10:00:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
armstrdj
Former World Community Grid Tech
Joined: Oct 21, 2004
Post Count: 695
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

Tony you are correct there is a bug with the first structure of a run being checkpointed. The checkpoint is being taken but it looks like it is not making the appropriate boinc API call to signal that it has been done. Thanks for tracking this down I am working on the fix.

Thanks,
armstrdj
[Aug 10, 2017 12:25:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
marist_college
Advanced Cruncher
USA
Joined: Mar 30, 2005
Post Count: 107
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

We are still getting download errors (-224) and md5 hash errors (-119) on some of the beta WUs. These are happening with various BOINC client versions on both Windows and Mac.

Current results status view for beta:
~75 error (the 2 errors listed above)
~885 valid
~525 in progress
----------------------------------------

[Aug 10, 2017 3:20:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

Summary of all WU's received to date:
13 WU's Received, 3 In Progress, 3 errored
starbase1 valid valid valid valid fedora25 AMD-FX8350
starbase2 valid ----- ----- ----- fedora25 AMD FX8350
starbase3 error inprog ----- ----- fedora26 AMD PhenomII
oneof4 error error ----- ----- SL.el7 AMD APU
twoof4 inprog ----- ----- ----- fedora25 AMD APU
threeof4 valid inprog ----- ----- fedora25 AMD APU
fourof4 valid ------ ----- ----- fedora25 AMD APU
The 3 errored WU's failed immediately and all Windows wingmen validated.
[Aug 10, 2017 5:21:14 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 171   Pages: 18   [ Previous Page | 8 9 10 11 12 13 14 15 16 17 | Next Page ]
[ Jump to Last Post ]
Post new Thread