Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 3
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 622 times and has 2 replies Next Thread
svincent
Advanced Cruncher
Joined: Jan 3, 2009
Post Count: 53
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Workunit exceeding time limit

Workunit E224187_316_I.62.C55H32N4O3.00409680.3.set1d06 https://secure.worldcommunitygrid.org/ms/devi....do?workunitId=1164901898 failed after 18 hours processing time because the cpu limit had been exceeded. The same thing happened 4 seperate times previously.

The last checkpoint was at 19 minutes: looks like the next major stage got stuck.

Suggest aborting this workunit.
[Aug 6, 2014 5:55:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
branjo
Master Cruncher
Slovakia
Joined: Jun 29, 2012
Post Count: 1892
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Workunit exceeding time limit

Hi svincent,

Unfortunately, we are not able to see your WU via the link - the better way to show us what is going on is to post screenshot image.

You say it "failed" - you mean it has "Error" status in your "My Contribution: Results Status"?

Because the limit for CEP2 WU's is exactly 18 hours. Then they are "pulled off" of the computers and the remaining, uncrunched work is assigned to another cruncher(s).

Cheers and good luck
----------------------------------------

Crunching@Home since January 13 2000. Shrubbing@Home since January 5 2006

[Aug 6, 2014 7:29:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
svincent
Advanced Cruncher
Joined: Jan 3, 2009
Post Count: 53
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Workunit exceeding time limit

Hi branjo,

Some of these issues are being taken up in other threads; there seem to quite a few problems right now.

A bit of a tangent, but the link works ok for me: I suppose it must be something related to the https protocol. Why the results of a workunit should be secure is a bit of a mystery to me.

Yes, it had the status set to "Error". It seemed that the long early stage (Stage 3 IIRC) failed to complete: it spent 17 hours and 40 minutes without recording a checkpoint. It's a big molecule with 55 carbon atoms but still no bigger than many that have completed successfully in the past. This on a 3GHz Haswell.

Not quite sure what screenshot is useful: here are the last few of the result log

10:31:29] Qink name = gesman
[10:31:32] Qink name = scfman
Killing job because cpu time limit has been exceeded. 1167.866175||63632.582531||0.000000
[10:38:58] Finished Job #2
10:39:02 (30239): called boinc_finish

</stderr_txt>
[Aug 7, 2014 3:26:12 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread