Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 171
|
![]() |
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Batch 58 is showing the same odd checkpointing behaviour. This unit has reached 12.5% progress, properties show no checkpoint, checkpoint_debug flag is on now but no such message from this task in the event log yet, but stderr.txt gives this at the end (so far):
Setting up checkpointing ... BOINC:: Worker startup. Starting job S_0001 Finished job S_0001 in 615.359 seconds Starting job S_0002 This is BETA_ beta26_ 00000058_ 0376_ 0-- "Checkpoint at most every" is still set to 300 sec. At 20% progress, properties gives a checkpoint time, event log has this: 09/08/2017 20:36:10 | World Community Grid | [checkpoint] result BETA_beta26_00000058_0376_0 checkpointed and stderr has this appended: Finished job S_0002 in 481.844 seconds Starting job S_0003 At this stage, boinc_checkpoint_count.txt appeared with content "1". It looks like structure 1 is not triggering a checkpoint (or at least no evidence of one). |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Yesterday my WinXP box got some beta WUs, but the runtime had been updated so they failed to load. I patched the runtime to set the subsystem flags to 5 instead of 6 and waited. A few hours ago I got some more, and they ran clean and validated.
Please patch the runtime before release so that XP users don't have to know how to patch things themselves. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
After completion of 9 out of the 10 structures in BETA_ beta26_ 00000058_ 0376_ 0, the boinc_checkpoint_count.txt was 8 and checkpoint_debug gave 8 instances of a checkpoint. It does seem that just structure 1 is failing to trigger a checkpoint. A variable being initialised wrongly or not at all?
![]() Another question - does the final structure trigger a checkpoint, i.e. just before workunit completion? Observation - BETA_ beta26_ 00000058_ 0376_ 0 did not show a checkpoint in the event log just before completion, but a different unit BETA_ beta26_ 00000058_ 0099_ 0-- (that finished seconds before 0376) did show a checkpoint just before completion: 09/08/2017 21:47:35 | World Community Grid | [checkpoint] result BETA_beta26_00000058_0099_0 checkpointed 09/08/2017 21:47:37 | World Community Grid | Message from task: 0 09/08/2017 21:47:38 | World Community Grid | Computation for task BETA_beta26_00000058_0099_0 finished 0099 showed a total of 9 checkpoints, also missing that after structure 1. |
||
|
Skivelitis2
Advanced Cruncher USA Joined: Mar 21, 2015 Post Count: 113 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Yesterday my WinXP box got some beta WUs, but the runtime had been updated so they failed to load. I patched the runtime to set the subsystem flags to 5 instead of 6 and waited. A few hours ago I got some more, and they ran clean and validated. Please patch the runtime before release so that XP users don't have to know how to patch things themselves. +1 Only unit from current batch on Win XP 32 failed. Four on LM 18.2 so far so good, 3 complete and validated,1 in progress. ![]() |
||
|
Seoulpowergrid
Veteran Cruncher Joined: Apr 12, 2013 Post Count: 818 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Unsure if it is useful as it wasn't on my machine, but a beta WUs that was able to download and run/complete on my machine (Mac) but failed on a Win and a linux.
----------------------------------------BETA_beta26_00000056_1204 Opening Workunit Status and clicking Error for the Linux gets me this: <core_client_version>7.6.31</core_client_version> <![CDATA[ <message> app_version download error: couldn't get input files: <file_xfer_error> <file_name>wcgrid_beta26_rosetta_7.10_x86_64-pc-linux-gnu</file_name> <error_code>-224 (permanent HTTP error)</error_code> <error_message>permanent HTTP error</error_message> </file_xfer_error> <file_xfer_error> <file_name>beta26_image03_7.10.tga</file_name> <error_code>-224 (permanent HTTP error)</error_code> <error_message>permanent HTTP error</error_message> </file_xfer_error> <file_xfer_error> <file_name>beta26_image04_7.10.tga</file_name> <error_code>-224 (permanent HTTP error)</error_code> <error_message>permanent HTTP error</error_message> </file_xfer_error> <file_xfer_error> <file_name>beta26_image05_7.10.tga</file_name> <error_code>-224 (permanent HTTP error)</error_code> <error_message>permanent HTTP error</error_message> </file_xfer_error> </message> ]]> And the same procedure for the Microsoft Windows 8.1 gets me: <core_client_version>7.2.47</core_client_version> <![CDATA[ <message> app_version download error: couldn't get input files: <file_xfer_error> <file_name>wcgrid_beta26_rosetta_7.10_windows_intelx86</file_name> <error_code>-224 (permanent HTTP error)</error_code> <error_message>permanent HTTP error</error_message> </file_xfer_error> <file_xfer_error> <file_name>wcgrid_beta26_gfx_7.10_windows_intelx86</file_name> <error_code>-224 (permanent HTTP error)</error_code> <error_message>permanent HTTP error</error_message> </file_xfer_error> </message> ]]> Same deal with BETA_beta26_00000058_0694 The windows box looks like the same client (OS Version is the same at Enterprise x64 Edition, (06.03.9600.00)) and the contents of the Error is the same. ![]() |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1323 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
All the beta's so far on Windows are Valid.
On Linux64 (Device 2078465) however it seems to be batch dependent: Batch 57: 14 errors got signal 11, no valids Batch 58: 14 Valids 0 errors Batch 59: 10 errors got signal 11, no valids Batch 60: 10 errors got signal 11, no valids Batch 61: 4 in progress and running fine |
||
|
Doublec
Advanced Cruncher France Joined: Aug 25, 2006 Post Count: 58 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
On Windows Pro 8.1 (64 bits) :
Nom du résultat: BETA_ beta26_ 00000058_ 1952_ 0-- <core_client_version>7.6.33</core_client_version> <![CDATA[ <message> app_version download error: couldn't get input files: <file_xfer_error> <file_name>wcgrid_beta26_rosetta_7.10_windows_intelx86</file_name> <error_code>-224 (permanent HTTP error)</error_code> <error_message>permanent HTTP error</error_message> </file_xfer_error> </message> ]]> |
||
|
armstrdj
Former World Community Grid Tech Joined: Oct 21, 2004 Post Count: 695 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Tony you are correct there is a bug with the first structure of a run being checkpointed. The checkpoint is being taken but it looks like it is not making the appropriate boinc API call to signal that it has been done. Thanks for tracking this down I am working on the fix.
Thanks, armstrdj |
||
|
marist_college
Advanced Cruncher USA Joined: Mar 30, 2005 Post Count: 107 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
We are still getting download errors (-224) and md5 hash errors (-119) on some of the beta WUs. These are happening with various BOINC client versions on both Windows and Mac.
----------------------------------------Current results status view for beta: ~75 error (the 2 errors listed above) ~885 valid ~525 in progress ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Summary of all WU's received to date:
13 WU's Received, 3 In Progress, 3 errored starbase1 valid valid valid valid fedora25 AMD-FX8350 starbase2 valid ----- ----- ----- fedora25 AMD FX8350 starbase3 error inprog ----- ----- fedora26 AMD PhenomII oneof4 error error ----- ----- SL.el7 AMD APU twoof4 inprog ----- ----- ----- fedora25 AMD APU threeof4 valid inprog ----- ----- fedora25 AMD APU fourof4 valid ------ ----- ----- fedora25 AMD APU The 3 errored WU's failed immediately and all Windows wingmen validated. |
||
|
|
![]() |