Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: The Clean Energy Project - Phase 2 Forum Thread: exited with zero status |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 26
|
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I couldn't find a thread on this when I searched.
I have been noticing that the CEP tasks keep getting reset to zero. I tried reseting the project but that didn't help. The tasks get a little ways through and then every one of the CEP tasks all reset to 0% and start again. An example is below: 1/23/2012 6:59:39 PM World Community Grid Task E205404_655_C.31.C27H18N2SSi.00217312.1.set1d06_0 exited with zero status but no 'finished' file 1/23/2012 6:59:39 PM World Community Grid If this happens repeatedly you may need to reset the project. What's happening? I used to only run 2 CEP tasks at once but I recently upped it to 8 at a time. I used to not have an issue with running that many. Did something change that makes it impossible to run that many? I'm going to drop it back down to 2 and see what happens. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I've seen it before when I hadn't made an exception in an antivirus solution for the BOINC/WCG data directory - i.e., a file sharing/locking conflict.
Then again, I've also seen it when I had failing RAID sets (because the drivers and/or hardware weren't up to snuff...not to mention names, but I still marvell at that particular failure). I'd check system/event logs first. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
well, I tried it with just 2 tasks at a time and it's still failing. Looks like CEP has turned useless on my machine. HCC is still buzzing right along with no issues.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
A very busy system could cause this too [and CEP2 will do that on underpowered systems]. One more reason why this science is opt-in and 1 task at the time.
There's a Start Here topic on this btw. Look in the stickied index [over all forums]. --//-- |
||
|
David Autumns
Ace Cruncher UK Joined: Nov 16, 2004 Post Count: 11062 Status: Offline Project Badges: |
Looks like I lost 4 results overnight for the same reason
----------------------------------------Currently have this project running on 3 machines with 12 cores between them and my stats have been down Now I know why Will keep on crunching Please let my results always be returned ^ | Little prayer Dave |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1842 Status: Offline Project Badges: |
Not sure if this fits into this thread but one (out of 4 that I selected to run CEP2) of my hosts terminated all CEP2 tasks thrown at it within a matter of minutes:
----------------------------------------Result Name: E206033_ 543_ C.22.C16H10OS2SeSi2.01441855.0.set1d06_ 0--What gives? This is a machine that is currently sitting idle, and should not be "underpowered" either Starting BOINC client version 6.10.58 for windows_x86_64GFAM WUs on this machine work just fine, so do the CEP2 WUs on the other 3 hosts that have that project in their selection... Ralf [Edit 1 times, last edit by TKH at Feb 16, 2012 1:56:12 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello TPCBF,
Search found 28 posts with exit code 195 for CEP2. No cause for this problem was ever found while checking the computer, but , when checked, Results Status showed every computer with the same work unit errored out. So my guess is that the problem is caused by the work unit in combination with the program. The computer is fine. Please keep track of these work units in Results Status and tell me if I am a or a Lawrence |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1842 Status: Offline Project Badges: |
Hello TPCBF, Well, after 5 WUs in a row failed on that box, I switched the device profile to one that doesn't include CEP2. Can give it a try again once weekend comes along, hopefully...Search found 28 posts with exit code 195 for CEP2. No cause for this problem was ever found while checking the computer, but , when checked, Results Status showed every computer with the same work unit errored out. So my guess is that the problem is caused by the work unit in combination with the program. The computer is fine. Please keep track of these work units in Results Status and tell me if I am a or a Lawrence Maybe didn't search right, but didn't see anything that fit, that's why I mentioned it. 3 other hosts enabled for CEP2 at the same time work just fine, with the only real difference obvious to me is that those are running 32bit Windows XP Pro/Windows Server 2003, while the "problematic" one is Windows 7 64bit... Ralf |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Dear TPCBF,
this is a strange problem and we are not quite sure what to make of it. If it persists, please post again and maybe the IBM-WCG team can chime in. Best wishes from Your Harvard CEP team |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
From an off-line discussion, an option is looked for to automate the maximum number of ''intense'' jobs. Something that members can configure in their device profile and the client would of course have as a feature... the next client after the one after the following one, who knows. Would require both server and client side coding change. Then the client could have as many CEP2 in the buffer, but if you have set e.g. 2 intense concurrent on a quad, the client would only let 2 run at the time. Of course you'd need other work too, else these remaining cores would idle.
----------------------------------------Something along these lines, then everyone can do the discovery what setting avoids heartbeat/exit zero status, and really forget for longer :D --//-- edit: This would be over and above the already available [and by me used], non-BOINC CPU % load control. Mine is set at 40% on Linux and then pauses the client whenever the load is greater... better to loose a few minutes, than loose whole tasks. [Edit 1 times, last edit by Former Member at Feb 22, 2012 5:13:03 PM] |
||
|
|