Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 26
Posts: 26   Pages: 3   [ 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1869 times and has 25 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
exited with zero status

I couldn't find a thread on this when I searched.
I have been noticing that the CEP tasks keep getting reset to zero. I tried reseting the project but that didn't help. The tasks get a little ways through and then every one of the CEP tasks all reset to 0% and start again. An example is below:

1/23/2012 6:59:39 PM World Community Grid Task E205404_655_C.31.C27H18N2SSi.00217312.1.set1d06_0 exited with zero status but no 'finished' file
1/23/2012 6:59:39 PM World Community Grid If this happens repeatedly you may need to reset the project.


What's happening? I used to only run 2 CEP tasks at once but I recently upped it to 8 at a time. I used to not have an issue with running that many. Did something change that makes it impossible to run that many?
I'm going to drop it back down to 2 and see what happens.
[Jan 24, 2012 12:52:25 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: exited with zero status

I've seen it before when I hadn't made an exception in an antivirus solution for the BOINC/WCG data directory - i.e., a file sharing/locking conflict.

Then again, I've also seen it when I had failing RAID sets (because the drivers and/or hardware weren't up to snuff...not to mention names, but I still marvell at that particular failure). I'd check system/event logs first.
[Jan 24, 2012 9:33:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: exited with zero status

well, I tried it with just 2 tasks at a time and it's still failing. Looks like CEP has turned useless on my machine. HCC is still buzzing right along with no issues.
[Jan 24, 2012 10:04:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: exited with zero status

A very busy system could cause this too [and CEP2 will do that on underpowered systems]. One more reason why this science is opt-in and 1 task at the time.

There's a Start Here topic on this btw. Look in the stickied index [over all forums].

--//--
[Jan 24, 2012 10:26:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
David Autumns
Ace Cruncher
UK
Joined: Nov 16, 2004
Post Count: 11062
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: exited with zero status

Looks like I lost 4 results overnight for the same reason

Currently have this project running on 3 machines with 12 cores between them and my stats have been down

Now I know why

Will keep on crunching

praying Please let my results always be returned
^
|
Little prayer

Dave
----------------------------------------

[Feb 4, 2012 8:11:39 AM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1842
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
confused Re: exited with zero status

Not sure if this fits into this thread but one (out of 4 that I selected to run CEP2) of my hosts terminated all CEP2 tasks thrown at it within a matter of minutes:
Result Name: E206033_ 543_ C.22.C16H10OS2SeSi2.01441855.0.set1d06_ 0--
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
- exit code 195 (0xc3)
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[13:06:46] Number of jobs = 16
[13:06:46] Starting job 0,CPU time has been restored to 0.000000.
Application exited with RC = 0x9
[13:09:40] Finished Job #0
13:09:40 (3872): called boinc_finish

</stderr_txt>
]]>
What gives?
This is a machine that is currently sitting idle, and should not be "underpowered" either
		Starting BOINC client version 6.10.58 for windows_x86_64
log flags: file_xfer, sched_ops, task
Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
Running as a daemon
Data directory: C:\ProgramData\BOINC
Running under account boinc_master
Processor: 2 GenuineIntel Intel(R) Pentium(R) CPU B940 @ 2.00GHz [Family 6 Model 42 Stepping 7]
Processor: 256.00 KB cache
Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36
clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 nx lm tm2 popcnt pbe
OS: Microsoft Windows 7: Home Premium x64 Edition, Service Pack 1, (06.01.7601.00)
Memory: 3.95 GB physical, 7.90 GB virtual
Disk: 455.34 GB total, 377.40 GB free
Local time is UTC -8 hours
No usable GPUs found
World Community Grid URL http://www.worldcommunitygrid.org/; Computer ID 1757947; resource share 100
World Community Grid General prefs: from World Community Grid (last modified 14-Feb-2012 18:58:00)
World Community Grid Computer location: work
General prefs: using separate prefs for work
Preferences:
max memory usage when active: 2021.93MB
max memory usage when idle: 3032.89MB
max disk usage: 10.00GB
don't use GPU while active
suspend work if non-BOINC CPU load exceeds 50 %

GFAM WUs on this machine work just fine, so do the CEP2 WUs on the other 3 hosts that have that project in their selection...

Ralf confused

----------------------------------------

----------------------------------------
[Edit 1 times, last edit by TKH at Feb 16, 2012 1:56:12 PM]
[Feb 16, 2012 1:20:28 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: exited with zero status

Hello TPCBF,
Search found 28 posts with exit code 195 for CEP2. No cause for this problem was ever found while checking the computer, but , when checked, Results Status showed every computer with the same work unit errored out. So my guess is that the problem is caused by the work unit in combination with the program. The computer is fine.

smile Please keep track of these work units in Results Status and tell me if I am a clown or a idea

Lawrence
[Feb 16, 2012 6:40:31 AM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1842
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: exited with zero status

Hello TPCBF,
Search found 28 posts with exit code 195 for CEP2. No cause for this problem was ever found while checking the computer, but , when checked, Results Status showed every computer with the same work unit errored out. So my guess is that the problem is caused by the work unit in combination with the program. The computer is fine.

smile Please keep track of these work units in Results Status and tell me if I am a clown or a idea

Lawrence
Well, after 5 WUs in a row failed on that box, I switched the device profile to one that doesn't include CEP2. Can give it a try again once weekend comes along, hopefully...

Maybe didn't search right, but didn't see anything that fit, that's why I mentioned it.
3 other hosts enabled for CEP2 at the same time work just fine, with the only real difference obvious to me is that those are running 32bit Windows XP Pro/Windows Server 2003, while the "problematic" one is Windows 7 64bit...

Ralf
----------------------------------------

[Feb 16, 2012 5:43:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: exited with zero status

Dear TPCBF,
this is a strange problem and we are not quite sure what to make of it. If it persists, please post again and maybe the IBM-WCG team can chime in.
Best wishes from
Your Harvard CEP team
[Feb 22, 2012 4:59:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: exited with zero status

From an off-line discussion, an option is looked for to automate the maximum number of ''intense'' jobs. Something that members can configure in their device profile and the client would of course have as a feature... the next client after the one after the following one, who knows. Would require both server and client side coding change. Then the client could have as many CEP2 in the buffer, but if you have set e.g. 2 intense concurrent on a quad, the client would only let 2 run at the time. Of course you'd need other work too, else these remaining cores would idle.

Something along these lines, then everyone can do the discovery what setting avoids heartbeat/exit zero status, and really forget for longer :D

--//--

edit: This would be over and above the already available [and by me used], non-BOINC CPU % load control. Mine is set at 40% on Linux and then pauses the client whenever the load is greater... better to loose a few minutes, than loose whole tasks.
----------------------------------------
[Edit 1 times, last edit by Former Member at Feb 22, 2012 5:13:03 PM]
[Feb 22, 2012 5:10:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 26   Pages: 3   [ 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread