Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 11
|
![]() |
Author |
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Maybe this post should go in the DDDT-2 Run time Ranges thread, but it doesn't really belong there.
----------------------------------------And there may be other WUs that behave strangely, so let's have a new thread. erlc_ d002_ pr89b1_ 1 | Valid | 26/03/10 08:20:59 | 26/03/10 09:53:00 | 0.43 | 6.9 / 7.0 erlc_ d002_ pr89b1_ 0 | Valid | 26/03/10 08:20:57 | 26/03/10 11:57:06 | 0.27 | 7.2 / 7.0 (mine) This WU progressed to an indicated approximately 30% complete, then suddenly terminated normally. My device (Q9650 @ 3.9GHz) has crunched 2 other "pr" WUs in around 0.8h, and is currently running 1 other with an extrapolated completion time of about 0.8h too. The wingman has claimed similar credit for this short WU, so he probably did a similar amount of work on it, and probably experienced early termination too. [Edit 1 times, last edit by Rickjb at Mar 26, 2010 12:22:43 PM] |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
These sciences checkpoint every 2% progress i.e if you activate <checkpoint_debug> in the cc_config.xml and set a very short write to disk of 30 seconds, you might be able to find out, if you wish to of course.
----------------------------------------Personally, I wish all checkpoints lines (even those that the client setting does not permit writing to the message log), were stored to the result log. It's only slightly more info, but quite comforting when we laymen do some of the self-diagnostics... i.e. we should see 50 of them.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Sek: I have checkpoint_debug turned off for most devices, and will leave counting checkpoints in logs to someone with more time on their hands.
Some of the contribitors to your DDDT-2 Run time Ranges might find some early-terminators and thus identify which WU types are affected. That would make searching through results and log files much easier. [OT but related]: A suggestion: Type A WUs to checkpoint more often than every 2%. That would reduce lost crunching time dure to restarts. |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Sek,
Checkpointing on type A work units is every 2%. But on type B and C it is much much quicker. It tries to checkpoint every x times through the loops, which I believe ends up around 10 seconds... Rickjb, The early termination is ok. It is one of the positive negatives I have talked about in other threads. Multiple checks were done to make sure both you and your wingman encountered similar situations, so unfortunately the run was short, it did provide information for the researchers. Thanks, -Uplinger |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Rickjb,
Sorry due to a limitation of the dynamics loop, we are not able to make it checkpoint sooner than every 2%. We have tried and that was the best we were able to achieve. -Uplinger |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
"... to someone with more time on their hands."
----------------------------------------Thanks for reminding me.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
X-Files 27
Senior Cruncher Canada Joined: May 21, 2007 Post Count: 391 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Sorry due to a limitation of the dynamics loop, we are not able to make it checkpoint sooner than every 2%. We have tried and that was the best we were able to achieve. I also have this dilemma of losing some precious time due to: a) BOINC running its periodic benchmark b) switching applications c) EDF mode ![]() ![]() |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Point A) is fixed in a coming 6.10 version, I hope in the one that WCG is going to recommend. During the 30 second benchmark the sciences will then not be unloaded. Generally I've got LAIM on, but I see that for some that is no option. Switching apps is anyway done at checkpoints i.e. lossless, which leaves EDF... which should be rare with not excessively sized caches **.
----------------------------------------edit: ** and in the current high variability environment of jobs for DDDT2, HCMD2 and HFCC that means keeping it near 1.00 or lower.
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Mar 26, 2010 3:11:15 PM] |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Sek, Checkpointing on type A work units is every 2%. But on type B and C it is much much quicker. It tries to checkpoint every x times through the loops, which I believe ends up around 10 seconds... Rickjb, The early termination is ok. It is one of the positive negatives I have talked about in other threads. Multiple checks were done to make sure both you and your wingman encountered similar situations, so unfortunately the run was short, it did provide information for the researchers. Thanks, -Uplinger Thx for connecting the dot's I've missed. Glad I got the WTD set on 5 minutes else the message log would get truly overlong. As per mikaok's comment yesterday, it's also good to be able to eliminate and in this case even move on quicker.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
WU that ended early with Error -161
----------------------------------------These WUs have been discussed in the problematic batches? thread. The techs & scientists are working on the problem, but it's still happening. From the log file of my example (2 previous wingmen had the same): > <file_name>erlc_e019_pda004_0_2</file_name> > <error_code>-161</error_code> > </file_xfer_error> No-one else has quoted the corresponding message from the BOINC clients Messages tab: 27/03/2010 7:03:51 PM|World Community Grid|Computation for task erlc_e019_pda004_3 finished 27/03/2010 7:03:51 PM|World Community Grid|Output file erlc_e019_pda004_3_2 for task erlc_e019_pda004_3 absent Times are UTC+11. HTH - Rick [Edit 2 times, last edit by Rickjb at Mar 27, 2010 9:35:24 AM] |
||
|
|
![]() |