Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: The Clean Energy Project - Phase 2 Forum Thread: 5 or 6 hours of work lost |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 9
|
Author |
|
bozo the clone
Cruncher Joined: Aug 29, 2008 Post Count: 1 Status: Offline Project Badges: |
A task estimated to run for over 11 hours had been running for over 5 hours when I had to shutdown BOINC Manager. When I started BOINC Manager again the other tasks restarted where they were. The Clean Energy Project task started at the beginning.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
This is normal for CEP, checkpoints are very far apart. If you do not run 24/7 it is best to avoid them
|
||
|
sean0118
Cruncher Australia Joined: Feb 7, 2010 Post Count: 33 Status: Offline Project Badges: |
Does putting your computer into standby or hibernation mode avoid this issue? I have been scared to try to be honest.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Dear sean0118,
Sleep and hibernation are fine. If you search this forum for this issue you'll find a very detailed discussion. Best wishes, Your Harvard CEP team |
||
|
Loui_h20
Cruncher Joined: Aug 12, 2013 Post Count: 25 Status: Offline |
Looks like the problem has been fixed in 7.2.28!!!!
----------------------------------------Time after time watching 7 wu reset because 1 wu has finished, losing..... 7 x ~10+hrs per threads work is not fun. After updating, CEP2 is now running 8 threads on a i7-3770 with no problems!!!...for now ;-). |
||
|
ludarp
Cruncher Joined: Nov 5, 2011 Post Count: 2 Status: Offline Project Badges: |
Sleep and hibernation are fine Although... if you (against the recommendations for CEP2) have the option "Leave applications in memory" off, Boinc (7.0.64) terminates its apps just before sleeping. (I would understand for hibernation - less RAM to write to disk - but why for sleep??) I've just tested & reconfirmed this. Looks like the problem has been fixed in 7.2.28!!!! Great news! I'll be sure to check out the sleep behaviour after I upgrade. (For the sake of curiosity... I do run with LAIM on.) I've had WUs that didn't checkpoint at all between 0h24m and 11h30m (out of ~12h total)! So this setting really matters, e.g. for us folk who just crunch during the day. :-) [Edit 1 times, last edit by ludarp at Nov 24, 2013 11:10:29 PM] |
||
|
Loui_h20
Cruncher Joined: Aug 12, 2013 Post Count: 25 Status: Offline |
Sorry but it seems that after the fresh boinc update it worked fine for while then went back to "normal" reseting work units mode :-(
---------------------------------------- |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
With LAIM ON, suspend BOINC first [Activity Menu > Suspend] when hibernating/sleep, then powering up and system is fully up, set BOINC back to "Run Always/Run based on prefs", whatever your normal operating mode is for BOINC. I've asked the devs for the same delay function to kick in as at boot time, the <start_delay> function, but as is often the case, resistance as usual. My hibernate file is usually 6-7GB, the VM 16GB, so it takes time to resume and get the disk to come to rest, which is when CEP2 hangs occur due heavy disk IO [long enough to get a checkpoint reset... hours lost].
----------------------------------------If a CEP2 task finishes then zipping is ongoing and the [quite CPU drawing] networking. Same time a new CEP2 task is trying to do the model set-up, which is heavy too, so you risk a system overload... too much storage IO, which is when heartbeat/zero status issues develop [A hated function of BOINC, the devs themselves would love to replace for years but have not]. Remedy: Reduce concurrent CEP2 tasks to half the cores. It's easily controlled with the <max_concurrent> option in a app_config.xml file with the 7.0.4x and higher clients [search forum]. My octo runs 4 and I get 99% efficiency out of them at that. Other cores do MCM or FA@H or anything else non-WCG if the mood is like that. [Edit 1 times, last edit by Former Member at Dec 4, 2013 12:30:24 PM] |
||
|
CandymanWCG
Senior Cruncher Romania Joined: Dec 20, 2010 Post Count: 421 Status: Offline Project Badges: |
Sorry but it seems that after the fresh boinc update it worked fine for while then went back to "normal" reseting work units mode :-( Loui_h20 I was just about to ask what you were on about, but then reality beat me to it. What you probably experienced was a fluke and you must have rebooted Boinc or your machine just after CEP2 had checkpointed. So, for now and probably for a long time for now, the "24/7" and/or hibernate/sleep with LAIM on are the only options for this project. Cheers! Knowledge is limited. Imagination encircles the world! - Albert Einstein |
||
|
|