Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 8
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 681 times and has 7 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
exited with zero status....

After a few years I have a new try to support The Clean Energy Project and discover the same error that drove me out.

Why do I have this error after exact 4 hours of running time

26-3-2012 7:53:39 World Community Grid Restarting task E206781_766_C.25.C21H14N2SSi.00949546.3.set1d06_0 using cep2 version 640
26-3-2012 11:46:08 World Community Grid Task E206781_766_C.25.C21H14N2SSi.00949546.3.set1d06_0 exited with zero status but no 'finished' file
26-3-2012 11:46:08 World Community Grid If this happens repeatedly you may need to reset the project.
26-3-2012 11:46:08 World Community Grid Restarting task E206781_766_C.25.C21H14N2SSi.00949546.3.set1d06_0 using cep2 version 640
26-3-2012 15:44:55 World Community Grid Task E206781_766_C.25.C21H14N2SSi.00949546.3.set1d06_0 exited with zero status but no 'finished' file
26-3-2012 15:44:55 World Community Grid If this happens repeatedly you may need to reset the project.
26-3-2012 15:44:55 World Community Grid Restarting task E206781_766_C.25.C21H14N2SSi.00949546.3.set1d06_0 using cep2 version 640
26-3-2012 19:43:43 World Community Grid Task E206781_766_C.25.C21H14N2SSi.00949546.3.set1d06_0 exited with zero status but no 'finished' file
26-3-2012 19:43:43 World Community Grid If this happens repeatedly you may need to reset the project.
26-3-2012 19:43:43 World Community Grid Restarting task E206781_766_C.25.C21H14N2SSi.00949546.3.set1d06_0 using cep2 version 640

Please advice,

Jaap
[Mar 26, 2012 6:58:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Dataman
Ace Cruncher
Joined: Nov 16, 2004
Post Count: 4865
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: exited with zero status....

----------------------------------------


[Mar 26, 2012 7:03:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: exited with zero status....

Because most probably, your system is not up to it [why this science is opt-in, so volunteers can decide if it runs or does not]

Sorry

--//--
[Mar 26, 2012 7:03:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: exited with zero status....

I think this is how this error occurs:
The error occurs when the computer pauses for a while waiting for some kind of resource, usually related to hardware. The main boinc.exe program regularly asks each science task for some kind of status information. If the science task can't respond in the time allowed because the system has paused, boinc.exe assumes that it has died and terminates it.

A slow or overcommitted hard drive can trigger this condition. CEP2 is very good at overloading your hard drive. Each CEP2 WU contains 16 sub-jobs that run 1 at a time. When each one starts, it creates a huge number of small data files, and the hard drive may not keep up, especially if it is already busy, eg running a virus scan, or starting up another CEP2 WU in a multi-core computer, or having insufficient RAM so that it has to swap tasks out to the pagefile.

CEP2 tasks also generate a huge number of "page faults", where small chunks of data are exchanged between RAM and the pagefile. If only 1-2 CEP2 WUs are running, the HDD activity LED won't show much activity, I think because most of this activity is handled by the operating system's disc cache. However, on multi-core (4 and up) systems that are running CEP2 on most cores, the HDD will become quite busy and the whole machine will take micro-sleeps waiting for the HDD. And then when a new CEP2 job starts making all those little files ...

I had these errors with DDDT2 at one stage, on a machine which had a slow "Green" HDD that I think was defective from new. Although it always passed diagnostics and never gave a hard error, it would occasionally go into super-slow mode and these errors would happen. The HDD activity LED would always be lit brightly and continuously at these times.

Waiting for some other kind of resources such as a network connection might also cause the problem.

If your computer has multiple cores, I would run a mixture of WCG projects and run CEP2 on less than 50% of them. I aim to run CEP2 on 25% of cores. If the errors still occur, try to observe what happens just before an error. Check the behaviour of the HDD activity LED, and check the response when you click on different windows on the screen.

Hope this helps.
----------------------------------------
[Edit 2 times, last edit by Rickjb at Mar 27, 2012 7:10:27 AM]
[Mar 27, 2012 6:54:42 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: exited with zero status....

Thanks for the support. I will leave this project and try something else. Have a 2004 state of the art machine.
Jaap
[Mar 27, 2012 7:29:31 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: exited with zero status....

Dear PCHooft,
sorry that CEP2 does not seem to be the right fit for your hardware - but the other WCG projects are also great!
Best wishes
Your Harvard CEP team
[Mar 28, 2012 3:04:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: exited with zero status....

That sounds like a good explanation however I have been running my 6 core for quite a while and only fairly recently seen these start to pop up every now and then. I havent really babysat it but does it kill the whole unit or just restart it from last checkpoint? Perhaps some minor programming tweak can be done that it just restarts from last checkpoint instead of killing the entire thing if that is the case.

Aaron
[Mar 29, 2012 8:12:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: exited with zero status....

Start basysitting. The first 100 burbs [heartbeat loss] the task goes back to last checkpoint [a serious time oops on the 3rd job of 16]. At 101 of such heartbeat losses the task is deemed broken and send home. The Result Log will record a heartbeat problem.

--//--

Edit: The not so shy can look in the stderr.txt file of the task-slot while the job is running when they see this happening. (And on foresight, the developers have hidden the message log, for the over concerned parents... the message tab was removed and in place a Notices tab was put that will generate a pop up and a blinking red button if user intervention is required. To quote the developer yesterday: "The event log is for developers and testers. Information for normal users goes in the other parts of the GUI. If we're not showing something important, the solution is to figure out where to show it."). Zero status comes with the "If this happens frequently..." i.e. if occasionally... move on".
----------------------------------------
[Edit 1 times, last edit by Former Member at Mar 30, 2012 6:48:10 AM]
[Mar 30, 2012 6:40:46 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread