Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 5
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 831 times and has 4 replies Next Thread
Chris311
Cruncher
Germany
Joined: Jan 3, 2016
Post Count: 20
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Stange behavior

Hi,

my computer was working on this WU:

E236329_ 119_ S.220.C19H16O3S2Se1.XRHIWRQHDJFIQN-UHFFFAOYSA-N.8_ s1_ 14_ 0--

At 90% / 16 hours the computer crashed, after it back online all progress was lost. At the time of the crash 3 other WUs (2x MCM and 1x ugm) were running and their progress wasn`t lost.

Now the odd thing... after roughly one and a half hours the WU was finished and uploaded (“valid”).

Why took the first try so long and the second try was finished after one and a half hours?

Have a great Sunday.

Cheers,
Chris
----------------------------------------

[Mar 13, 2016 8:34:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Stange behavior

Regrettably, Normal. Progress is measured as hours run of a max of 18, so 16 hours is about 90%. If this was still in #0 [of 8 jobs in a task], there was [could not be] progress checkpoint saving, so on restart all is lost.

As for the weird/valid completion in 1.5, post a copy of the result log [valid status link]. That can tell us more [but not necessarily all that happened].
[Mar 13, 2016 9:22:57 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Chris311
Cruncher
Germany
Joined: Jan 3, 2016
Post Count: 20
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Stange behavior

Thank you for the info.

Here is the result log:

Result Name: E236329_ 119_ S.220.C19H16O3S2Se1.XRHIWRQHDJFIQN-UHFFFAOYSA-N.8_ s1_ 14_ 0--
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[15:28:50] Number of jobs = 8
[15:28:50] Starting job 0,CPU time has been restored to 0.000000.
INFO: No state to restore. Start from the beginning.
[12:12:06] Number of jobs = 8
[12:12:06] Starting job 0,CPU time has been restored to 0.000000.
Application exited with RC = 0x1
[14:22:50] Finished Job #0
[14:22:50] Starting job 1,CPU time has been restored to 5303.812500.
[14:22:50] Skipping Job #1
[14:22:50] Starting job 2,CPU time has been restored to 5303.812500.
[14:22:50] Skipping Job #2
[14:22:50] Starting job 3,CPU time has been restored to 5303.812500.
[14:22:50] Skipping Job #3
[14:22:50] Starting job 4,CPU time has been restored to 5303.812500.
[14:22:50] Skipping Job #4
[14:22:50] Starting job 5,CPU time has been restored to 5303.812500.
[14:22:50] Skipping Job #5
[14:22:50] Starting job 6,CPU time has been restored to 5303.812500.
[14:22:50] Skipping Job #6
[14:22:50] Starting job 7,CPU time has been restored to 5303.812500.
[14:22:50] Skipping Job #7
14:22:51 (3540): called boinc_finish

</stderr_txt>
]]>
----------------------------------------

[Mar 13, 2016 10:18:40 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Stange behavior

Yes, logs like that have been posted before. The "Application exited with RC = / Job #0 Finished". We have not learned why these are declared valid. In your case, the log shows it was started at 15:28, then next day restarted at 12:12 without any actual progress being recorded and short order finished. Maybe the launch point on restart is not the same? Most weird.
[Mar 13, 2016 10:29:57 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Chris311
Cruncher
Germany
Joined: Jan 3, 2016
Post Count: 20
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Stange behavior

Thanks again.

I just want to make sure that no important data is lost since the first try of this WU was calculating for 16h and the second was only running 1 1/2h… Maybe someone can crunch that WU again?

Cheers,
Chris
----------------------------------------

[Mar 13, 2016 11:02:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread