Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 10
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 769 times and has 9 replies Next Thread
imakuni
Advanced Cruncher
Joined: Jun 11, 2009
Post Count: 90
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Multiple tasks failing (exit with zero stats but no finished file)

Title. It's been happening for a few days, on one of my machines. The majority of the tasks end up with this error, although eventually one or 2 WUs succed.

I already tryed resetting the project, as well as detaching WCG. Neither have worked. The client is set to use all disk space available (and there is plenty), so it's not that either.

I never happened with MCM, and when I was running CEP (around a month ago), it was doing just fine).
----------------------------------------

Want to have an image of yourself like this on? Check this thread: https://secure.worldcommunitygrid.org/forums/wcg/viewthread_thread,29840
[Jun 22, 2015 1:36:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Seoulpowergrid
Veteran Cruncher
Joined: Apr 12, 2013
Post Count: 799
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Multiple tasks failing (exit with zero stats but no finished file)

I've been having multiple errors for CEP2 these days on multiple machines (Win 7 and Linux/Ubunto recent install). What part of the code should I look at to see if I've been having the same errors as you've had?
----------------------------------------

[Jun 22, 2015 3:02:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Multiple tasks failing (exit with zero stats but no finished file)

All the answers and possible mitigation actions are covered extensively in this forum. Default is 1 at the time!, repeat, 1 at the time when opting in. Only if your hardware/OS is tuned to handle more, you can.
[Jun 22, 2015 3:50:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
imakuni
Advanced Cruncher
Joined: Jun 11, 2009
Post Count: 90
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Multiple tasks failing (exit with zero stats but no finished file)

All the answers and possible mitigation actions are covered extensively in this forum. Default is 1 at the time!, repeat, 1 at the time when opting in. Only if your hardware/OS is tuned to handle more, you can.

How about a 16 thread (8core) Xeon with 16gb of RAM and 2tb of HD? Shouldn't that be enough to handle extra WUs.?

Afterall, it used to, around a month ago (when I was running CEP on that machine). But now, it seems to be failing, for some reason. And I DOUBT it is because of weak hardware. I also have MANY other weaker machines that can run multiple CEP at a time, none of them are showing that problem.

Last, if this is "extensively covered in this forum", I suppose it wouldn't be that hard to copy one of those many links and post it here, no?

We appreciate the help, but we'll be waiting for a proper response.
----------------------------------------

Want to have an image of yourself like this on? Check this thread: https://secure.worldcommunitygrid.org/forums/wcg/viewthread_thread,29840
----------------------------------------
[Edit 1 times, last edit by imakuni at Jun 22, 2015 6:54:38 PM]
[Jun 22, 2015 6:53:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
petehardy
Senior Cruncher
USA
Joined: May 4, 2007
Post Count: 318
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Multiple tasks failing (exit with zero stats but no finished file)

How about a 16 thread (8core) Xeon with 16gb of RAM and 2tb of HD? Shouldn't that be enough to handle extra WUs.?


You need your Boinc data directory to be on a RAM disk or an SSD.
----------------------------------------

"Patience is a virtue", I can't wait to learn it!
[Jun 22, 2015 9:20:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Multiple tasks failing (exit with zero stats but no finished file)

There's a pop-up discussion started by the BOINC developer regarding a project of WCG that causes him failures, on Windows. He's not mentioned which, but it may be related to an API used in compiling the science prior to Oct.31 2014. The Techs might like to get in touch with David Anderson to ascertain which one exactly [zombie processes if LAIM is -NOT- set and BOINC is suspended, which is when sciences are supposed to unload]. In case of CEP2 it's always been strongly recommended to run with LAIM -ON-, but zombie processes related to CEP2 I've not heard of before. My thinking goes towards any of the sciences that have a controller/stager part and a worker, the controller exiting, but not the worker.

Edit: Not CEP2 but VINA, which narrows it down to OET1 and making FAHV an academic case, if it happens to this one.
----------------------------------------
[Edit 1 times, last edit by Former Member at Jun 22, 2015 10:14:35 PM]
[Jun 22, 2015 9:40:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Yarensc
Advanced Cruncher
USA
Joined: Sep 24, 2011
Post Count: 134
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Multiple tasks failing (exit with zero stats but no finished file)


How about a 16 thread (8core) Xeon with 16gb of RAM and 2tb of HD? Shouldn't that be enough to handle extra WUs.?


You're powerful computer might actually be more of a problem than a benefit for you're specific problem. Since it can run so many tasks at once and (presumably) quicker than normal, there's a good chance multple tasks are trying to write to your one harddrive at the same time. This can result in a timeout where a task crashes because it thinks the system has crashed, when in reality there are just 2 or 3 tasks ahead of it in line for the drive.

CEP2 has very large files (which is why it has a default of 1 at a time as SekeRob pointed out. This problem can be mitigated by running on a SSD or Ramdisk as petehardy pointed out, or by setting up an app_config.xml file to limit the number running concurrently and staggering their start. If you don't want to go through the work for either of those, just reset your profile to one at a time and fill the other 15 threads with other projects.

setting up max_concurrent with app_config -> https://secure.worldcommunitygrid.org/forums/...ead,37845_offset,0#487614
[Jun 23, 2015 3:53:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Multiple tasks failing (exit with zero stats but no finished file)

An interesting observation here is that CEP2 on my 4770 at 1900MHz instead of the regular 3700MHz [summer heat dictated], provided several percentage points better efficiency [now 98-99%]... implying the HD is not keeping up going full out.

Sample of last few recorded by BOINCTasks history:

7.00 cep2 E231218_479_S.244.C20F6H10N4S2.CECRYFGIAHSAQJ-UHFFFAOYSA-N.11_s1_14_0 07:21:37 (07:15:28) 6/23/2015 5:21:46 PM 6/23/2015 5:25:46 PM 98,61 Reported: OK * 207.66 MB 415.66 MB
7.00 cep2 E231217_429_S.240.C32H34N2.ZIQIFKYUGFEYKA-UHFFFAOYSA-N.6_s1_14_0 14:08:18 (13:58:06) 6/23/2015 10:02:21 AM 6/23/2015 10:06:22 AM 98,80 Reported: OK * 281.31 MB 476.03 MB
[Jun 23, 2015 5:25:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
armstrdj
Former World Community Grid Tech
Joined: Oct 21, 2004
Post Count: 695
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Multiple tasks failing (exit with zero stats but no finished file)

There's a pop-up discussion started by the BOINC developer regarding a project of WCG that causes him failures, on Windows.


SekeRob is this on one of the BOINC Dev email lists?

Thanks,
armstrdj
[Jun 30, 2015 3:22:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Multiple tasks failing (exit with zero stats but no finished file)

Yes, the noted Dr.A commented on the client alpha mail list and pointed at a VINA app of WCG, then another user wrote another project got it from WCG (Find?), which was not checkpointing, which to me sets the finger to be homing in at the first OET which did not do so.
[Jun 30, 2015 3:38:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread