Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 9
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1102 times and has 8 replies Next Thread
bozo the clone
Cruncher
Joined: Aug 29, 2008
Post Count: 1
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
5 or 6 hours of work lost

A task estimated to run for over 11 hours had been running for over 5 hours when I had to shutdown BOINC Manager. When I started BOINC Manager again the other tasks restarted where they were. The Clean Energy Project task started at the beginning.
[Nov 19, 2013 6:04:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: 5 or 6 hours of work lost

This is normal for CEP, checkpoints are very far apart. If you do not run 24/7 it is best to avoid them wink
[Nov 19, 2013 6:05:52 AM]   Link   Report threatening or abusive post: please login first  Go to top 
sean0118
Cruncher
Australia
Joined: Feb 7, 2010
Post Count: 33
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 5 or 6 hours of work lost

Does putting your computer into standby or hibernation mode avoid this issue? I have been scared to try to be honest.
[Nov 19, 2013 10:04:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: 5 or 6 hours of work lost

Dear sean0118,
Sleep and hibernation are fine. If you search this forum for this issue you'll find a very detailed discussion.
Best wishes,
Your Harvard CEP team
[Nov 19, 2013 11:39:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Loui_h20
Cruncher
Joined: Aug 12, 2013
Post Count: 25
Status: Offline
Reply to this Post  Reply with Quote 
Re: 5 or 6 hours of work lost

Looks like the problem has been fixed in 7.2.28!!!!

Time after time watching 7 wu reset because 1 wu has finished, losing..... 7 x ~10+hrs per threads work is not fun.

After updating, CEP2 is now running 8 threads on a i7-3770 with no problems!!!...for now ;-).
----------------------------------------

[Nov 22, 2013 5:40:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
ludarp
Cruncher
Joined: Nov 5, 2011
Post Count: 2
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 5 or 6 hours of work lost

Sleep and hibernation are fine

Although... if you (against the recommendations for CEP2) have the option "Leave applications in memory" off, Boinc (7.0.64) terminates its apps just before sleeping. (I would understand for hibernation - less RAM to write to disk - but why for sleep??) I've just tested & reconfirmed this.

Looks like the problem has been fixed in 7.2.28!!!!

Great news! I'll be sure to check out the sleep behaviour after I upgrade. (For the sake of curiosity... I do run with LAIM on.)
I've had WUs that didn't checkpoint at all between 0h24m and 11h30m (out of ~12h total)! So this setting really matters, e.g. for us folk who just crunch during the day. :-)
----------------------------------------
[Edit 1 times, last edit by ludarp at Nov 24, 2013 11:10:29 PM]
[Nov 24, 2013 11:09:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Loui_h20
Cruncher
Joined: Aug 12, 2013
Post Count: 25
Status: Offline
Reply to this Post  Reply with Quote 
Re: 5 or 6 hours of work lost

Sorry but it seems that after the fresh boinc update it worked fine for while then went back to "normal" reseting work units mode :-(
----------------------------------------

[Dec 4, 2013 12:09:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: 5 or 6 hours of work lost

With LAIM ON, suspend BOINC first [Activity Menu > Suspend] when hibernating/sleep, then powering up and system is fully up, set BOINC back to "Run Always/Run based on prefs", whatever your normal operating mode is for BOINC. I've asked the devs for the same delay function to kick in as at boot time, the <start_delay> function, but as is often the case, resistance as usual. My hibernate file is usually 6-7GB, the VM 16GB, so it takes time to resume and get the disk to come to rest, which is when CEP2 hangs occur due heavy disk IO [long enough to get a checkpoint reset... hours lost].

If a CEP2 task finishes then zipping is ongoing and the [quite CPU drawing] networking. Same time a new CEP2 task is trying to do the model set-up, which is heavy too, so you risk a system overload... too much storage IO, which is when heartbeat/zero status issues develop [A hated function of BOINC, the devs themselves would love to replace for years but have not]. Remedy: Reduce concurrent CEP2 tasks to half the cores. It's easily controlled with the <max_concurrent> option in a app_config.xml file with the 7.0.4x and higher clients [search forum]. My octo runs 4 and I get 99% efficiency out of them at that. Other cores do MCM or FA@H or anything else non-WCG if the mood is like that.
----------------------------------------
[Edit 1 times, last edit by Former Member at Dec 4, 2013 12:30:24 PM]
[Dec 4, 2013 12:29:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
CandymanWCG
Senior Cruncher
Romania
Joined: Dec 20, 2010
Post Count: 421
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 5 or 6 hours of work lost

Sorry but it seems that after the fresh boinc update it worked fine for while then went back to "normal" reseting work units mode :-(


Loui_h20 I was just about to ask what you were on about, but then reality beat me to it. What you probably experienced was a fluke and you must have rebooted Boinc or your machine just after CEP2 had checkpointed. So, for now and probably for a long time for now, the "24/7" and/or hibernate/sleep with LAIM on are the only options for this project.

Cheers! peace
----------------------------------------
Knowledge is limited. Imagination encircles the world! - Albert Einstein



[Dec 4, 2013 2:42:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread