Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: The Clean Energy Project - Phase 2 Forum Thread: Stange behavior |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 5
|
Author |
|
Chris311
Cruncher Germany Joined: Jan 3, 2016 Post Count: 20 Status: Offline Project Badges: |
Hi,
----------------------------------------my computer was working on this WU: E236329_ 119_ S.220.C19H16O3S2Se1.XRHIWRQHDJFIQN-UHFFFAOYSA-N.8_ s1_ 14_ 0-- At 90% / 16 hours the computer crashed, after it back online all progress was lost. At the time of the crash 3 other WUs (2x MCM and 1x ugm) were running and their progress wasn`t lost. Now the odd thing... after roughly one and a half hours the WU was finished and uploaded (“valid”). Why took the first try so long and the second try was finished after one and a half hours? Have a great Sunday. Cheers, Chris |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Regrettably, Normal. Progress is measured as hours run of a max of 18, so 16 hours is about 90%. If this was still in #0 [of 8 jobs in a task], there was [could not be] progress checkpoint saving, so on restart all is lost.
As for the weird/valid completion in 1.5, post a copy of the result log [valid status link]. That can tell us more [but not necessarily all that happened]. |
||
|
Chris311
Cruncher Germany Joined: Jan 3, 2016 Post Count: 20 Status: Offline Project Badges: |
Thank you for the info.
----------------------------------------Here is the result log: Result Name: E236329_ 119_ S.220.C19H16O3S2Se1.XRHIWRQHDJFIQN-UHFFFAOYSA-N.8_ s1_ 14_ 0-- <core_client_version>7.6.22</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [15:28:50] Number of jobs = 8 [15:28:50] Starting job 0,CPU time has been restored to 0.000000. INFO: No state to restore. Start from the beginning. [12:12:06] Number of jobs = 8 [12:12:06] Starting job 0,CPU time has been restored to 0.000000. Application exited with RC = 0x1 [14:22:50] Finished Job #0 [14:22:50] Starting job 1,CPU time has been restored to 5303.812500. [14:22:50] Skipping Job #1 [14:22:50] Starting job 2,CPU time has been restored to 5303.812500. [14:22:50] Skipping Job #2 [14:22:50] Starting job 3,CPU time has been restored to 5303.812500. [14:22:50] Skipping Job #3 [14:22:50] Starting job 4,CPU time has been restored to 5303.812500. [14:22:50] Skipping Job #4 [14:22:50] Starting job 5,CPU time has been restored to 5303.812500. [14:22:50] Skipping Job #5 [14:22:50] Starting job 6,CPU time has been restored to 5303.812500. [14:22:50] Skipping Job #6 [14:22:50] Starting job 7,CPU time has been restored to 5303.812500. [14:22:50] Skipping Job #7 14:22:51 (3540): called boinc_finish </stderr_txt> ]]> |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Yes, logs like that have been posted before. The "Application exited with RC = / Job #0 Finished". We have not learned why these are declared valid. In your case, the log shows it was started at 15:28, then next day restarted at 12:12 without any actual progress being recorded and short order finished. Maybe the launch point on restart is not the same? Most weird.
|
||
|
Chris311
Cruncher Germany Joined: Jan 3, 2016 Post Count: 20 Status: Offline Project Badges: |
Thanks again.
----------------------------------------I just want to make sure that no important data is lost since the first try of this WU was calculating for 16h and the second was only running 1 1/2h… Maybe someone can crunch that WU again? Cheers, Chris |
||
|
|