Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 7
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1331 times and has 6 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Long WU erroring out and still being reissued

This WU runs many days and times out before finishing. Its timed out 3 times and is being re-issued again. Its recorded as being an error

https://secure.worldcommunitygrid.org/ms/devi...s.do?workunitId=262300428

Result Name: dg01_ b185_ pr23b1_ 2--
<core_client_version>6.2.14</core_client_version>
<![CDATA[
<message>
Maximum CPU time exceeded
</message>
<stderr_txt>
Calling gridPlatform.init()
INFO: No state to restore. Start from the beginning.

----------------------------------------
[Edit 2 times, last edit by Former Member at Apr 6, 2011 10:19:08 AM]
[Apr 6, 2011 9:30:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
sk..
Master Cruncher
http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif
Joined: Mar 22, 2007
Post Count: 2324
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Long WU erroring out and still being reissued

Workunit Status

Project Name: Discovering Dengue Drugs - Together - Phase 2
Created: 27/03/11
Name: dg01_b185_pr23b1
Minimum Quorum: 2
Replication: 2


Result Name App Version Number Status Sent Time Time Due /
Return Time CPU Time (hours) Claimed/ Granted BOINC Credit
close

Your secure link will not show all the details to other users, it only works fully for yourself.

It was issued 10 days ago, so it has timed out.
Either you have a large cache or you did not have your computer on long enough to finish the task. You might want to check your Boinc configuration:

Have you ticked Leave Applications In Memory?
Do you run tasks when you are using your computer?

So you Might want to reduce your cache, select LAIM and allow tasks to run when you are using the system.
----------------------------------------
[Edit 1 times, last edit by skgiven at Apr 6, 2011 9:42:32 AM]
[Apr 6, 2011 9:40:57 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Long WU erroring out and still being reissued

I run 24/7 on a q9400. All tasks are not due before the 8/4/2011 this WU included. Applications are left in memory. The WU was issued 8 days ago and still had 2 days run time. It was due for completion before the the 10 day cut off !!

All 3 WU errored out :
dg01_ b185_ pr23b1_ 2-- 640 Error 01/04/11 07:25:04 05/04/11 15:35:13 43.58 713.4 / 0.0

dg01_ b185_ pr23b1_ 0-- 640 Error 29/03/11 08:11:20 01/04/11 07:07:15 52.91 541.2 / 0.0

dg01_ b185_ pr23b1_ 1-- 640 Error 29/03/11 08:11:08 06/04/11 08:56:00 35.78 789.4 / 0.0


The WU errored out after 43 52 and 35 hours.

Mine was the 35 hours and if I remember correctly it was just over 45% done

I have been keeping a close eye on the WU because it was so far beyond the parameters of the usual DDDT2 WU
----------------------------------------
[Edit 5 times, last edit by Former Member at Apr 6, 2011 11:10:11 AM]
[Apr 6, 2011 9:52:50 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1322
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Long WU erroring out and still being reissued

It was issued 10 days ago, so it has timed out.
Either you have a large cache or you did not have your computer on long enough to finish the task. You might want to check your Boinc configuration:

Have you ticked Leave Applications In Memory?
Do you run tasks when you are using your computer?

So you Might want to reduce your cache, select LAIM and allow tasks to run when you are using the system.

Have a close look to this:
 Maximum CPU time exceeded

[Apr 6, 2011 10:07:23 AM]   Link   Report threatening or abusive post: please login first  Go to top 
kateiacy
Veteran Cruncher
USA
Joined: Jan 23, 2010
Post Count: 1027
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Long WU erroring out and still being reissued

I'm getting ready to leave for work and don't have time to look up the link for you, so here's a quick explanation that probably isn't exactly right in some details.

When each WU loads, the BOINC agent has an idea of how much CPU time it is going to require on that machine. If the WU then runs for more than 10 (?) times that expected time, the BOINC agent assumes that something has gone wrong (it's stuck in an endless loop or the like) and stops it with the "Maximum CPU time exceeded" error. The reason is to prevent a bad WU from tying up a machine endlessly.

It looks as if this particular WU has triggered that on all 3 hosts on which it has run.

(Edited to correct a typo)
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by kateiacy at Apr 6, 2011 12:36:00 PM]
[Apr 6, 2011 12:34:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
jfpz
Cruncher
Joined: Apr 7, 2005
Post Count: 8
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Long WU erroring out and still being reissued

Hmm, I think you are onto something regarding some of the WUs in this project. Just been looking at one unit with 24 CPU hours (Dell server, Xeon 5160, 64bit linux) and the progress is stuck at 7% for several hours, time remaining is 44 hours and climbing. Those numbers don't sense, it might be time to send it to the great CPU in the sky.
[Apr 8, 2011 1:04:15 AM]   Link   Report threatening or abusive post: please login first  Go to top 
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Long WU erroring out and still being reissued

I'm seeing same problems on dg01 with runtime. I was gone for the weekend. Came home to find 1 had been running for 46 hours, 0% complete. Several others that were nearly 28 hours and still not complete. All on different machines. Probably shouldn't have but I aborted the 46 hour one and all the other 28 hour ones in cache.
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


----------------------------------------
[Edit 1 times, last edit by nanoprobe at Apr 11, 2011 7:08:29 PM]
[Apr 11, 2011 7:00:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread