Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: The Clean Energy Project - Phase 2 Forum Thread: wingman's wierd error |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 6
|
Author |
|
kateiacy
Veteran Cruncher USA Joined: Jan 23, 2010 Post Count: 1027 Status: Offline Project Badges: |
Here's the result log from a wingman's WU:
----------------------------------------Result Log Result Name: E202868_ 572_ C.27.C23H13NOS2.00074626.0.set1d06_ 1-- <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [22:12:46] Number of jobs = 16 [22:12:46] Starting job 0,CPU time has been restored to 0.000000. [22:12:46] Starting new Job [22:12:46] Qink name = fldman [22:12:47] Qink name = gesman [22:12:47] Qink name = scfman Killing job because cpu time limit has been exceeded. 0.000000||1266874890.181581||0.000000 [22:12:47] Finished Job #0 22:12:47 (8428): called boinc_finish </stderr_txt> ]]> CPU time limit exceeded in 1 second?? It's even odder because the matching Results Status entry shows 8.74 hrs CPU time: E202868_ 572_ C.27.C23H13NOS2.00074626.0.set1d06_ 1-- 640 Error 8/5/11 18:37:29 8/6/11 03:22:04 8.74 66.1 / 0.0 I noticed this because my WU came up as inconclusive -- it had completed all 16 jobs, which obviously doesn't match this! |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello kateiacy,
Logs like that make me think that a reboot would be a good idea for your wingman, in case some memory bit has flipped. [Shrug] Lawrence |
||
|
KWSN - A Shrubbery
Master Cruncher Joined: Jan 8, 2006 Post Count: 1585 Status: Offline |
Just a possibility, but your wingperson may have changed their system time. BOINC doesn't always deal well with that happening.
----------------------------------------Distributed computing volunteer since September 27, 2000 |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I don't know if this is related, but I still have a problem with CEP2 aborting jobs after a reboot. I don't know if there are issues resuming from a checkpoint or what, but it becomes a huge slowdown when 4 jobs try to start at the same time as windows services, etc. It makes my system basically unusable for 10-15 minutes after the reboot when this happens. It's gotten to the point where I will suspend BOINC, reboot, then cut the # of CPUs to two, let those two start, then set the number of CPUs back to 4. This project is a real pain, and if I didn't think it was so important, I wouldn't be crunching for it. As it is, I may take a break for awhile after 2 years crunch time.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi pleskinen,
----------------------------------------1. BOINC can be set to delay starting computing, so bootup goes much quicker. This is done by added/editing the option <start_delay>60</start_delay> to the cc_config.xml file, the value is in seconds. 2. Don't know what client you run. Some basically shut down too quick on Vista/W7, damaging the WU's. If you have 6.10.58 that is not an issue. When shutting down you can always stop the BOINC service first so the client has time to store the tasks in progress. 3. CEP2 is opt in, with a default of 1 per machine because they are very demanding, certainly when you also want to use the computer without it interfering [you can always let BOINC pause automatically when there's user input]. You don't have to play with number of CPU's. You can set the number of CEP2 that are assigned to a machine and let the rest run on something lighter such as HCC, HCMD2, C4CW. Let us know and we'll take it from what route you'd like to take in changes. --//-- edit: bootup [Edit 1 times, last edit by Former Member at Aug 15, 2011 1:29:24 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I appreciate the tips. I have added the start_delay parameter to my cc_config. If I can remember to 'snooze' BOINC before reboots, I think things will be much better. I hate to run less than 4 WUs at a time since I have 4 physical cores (I could probably run 5 or 6 even), but the occasional dumping of all 4 WU's at the same time has prevented me from running more.
----------------------------------------One other unavoidable problem occurs on a reboot when BOINC decides there are a few units in need of "High priority"--it will basically start 4 of those and leave the others suspended. While it will pick those up eventually, you still get the LONG start as a result of starting 4 new units at the same time. I will keep an eye on things and see if the problem recurs even when i 'snooze' before reboot. [Edit 1 times, last edit by Former Member at Aug 15, 2011 7:42:59 PM] |
||
|
|