Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 17
Posts: 17   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2528 times and has 16 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Recovery from checkpoints does not always work

Hi,

An observation. I have a laptop that I use for HDC using BOINC. If I suspend a task, it invariably starts from 0% when I restart it.

To be fair, this has happened more than once, and the wu seems more suspetable to go back to 0% when it has actually completed 50% or more (not sure about the 25% to 50% completed zone). 'Young' WU's seem to start from almost where they left off.

This is doubly annoying, as when you have lost an hour or more's crunch time, and you get to finally complete the WU, you then get penalised as it becomes a statistical outlier. angry

Edit - Note that percentage complete returns to zero, yet the run time commences from where it left off... confused

Jonathan.
----------------------------------------
[Edit 2 times, last edit by Former Member at Nov 27, 2006 5:39:35 PM]
[Nov 27, 2006 5:37:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Recovery from checkpoints does not always work

Hi,

The program developers very well know about the checkpointing and the regression to an early checkpoint, which particular on slower machines can lie apart quite a bit, but it's a trade off against writing very big files very frequently to disk.....see last paragraph below... both cases of a restart actually set them back to about 9%, where before BOiNC shutdown they were >51%.

The solution is to activate the hibernation feature. On restart, your work unit will take off exactly where it left off. Several helpful posts were made as how to set this up. My machine, sitting on UPS btw, goes in hibernation after 5 minutes or so and resumes right from the second it left off. Hibernation, opposed to Standby mode does not use a drop of juice.... it's a full memory state write to disk.

As to that penalization..... are u using the 5.7.x development version for this to happen i.e. the non-loss of CPU time. It's a balancing act i suppose and don't know if it will stay in it. Myself had 2 border line outliers, which in fact helped to pull up the median to above a credit of what would have been granted under the pre-outlier policy. Have already noted the observation today. Don't know if its an alpha BOiNC feature or a design into the science.

cheers
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 4 times, last edit by Sekerob at Nov 27, 2006 6:17:10 PM]
[Nov 27, 2006 5:52:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Recovery from checkpoints does not always work

Hi Sekerob...

Thanks for the info - I'll give hibernation a shot. wink

Using Vanilla (or geriatric??) 5.2.13 on this system...

Jonathan.
----------------------------------------
[Edit 1 times, last edit by Former Member at Nov 27, 2006 6:04:15 PM]
[Nov 27, 2006 6:00:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Recovery from checkpoints does not always work

Okay great.... on hibernation there are good posts here and a revelation by LawrenceHardin, that if u dont see hibernation after setting it up, holding the shift key will make it show in the windows(?) closing menu.

For Genome Comparison and likely HPF2, when it restarts on BOiNC u will likely have to use at least 5.4 as it transmits using compression features.

Here's the vanilla download page: http://boinc.berkeley.edu/download_all.php

5.2.13 is geriatric / old if u will.

ciao
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Nov 27, 2006 6:15:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Kaoh
Cruncher
ROC,Taiwan(NOT PRC)
Joined: Nov 20, 2005
Post Count: 18
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Recovery from checkpoints does not always work

I also found this problem
but I use boinc 5.4.11....
After suspension and restarting BOINC,
my HDC progress was from 66.2% to 3.6%...
It is terrible to my feeling...
so I cut it...
----------------------------------------

----------------------------------------
[Edit 2 times, last edit by Kaoh at Nov 30, 2006 12:18:46 AM]
[Nov 30, 2006 12:16:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Recovery from checkpoints does not always work

I have also come across this problem. I am using BOINC 5.4.11. Up until recently it could pick up its HDC work from where it had checkpointed, but now it will often start crunching the WU from the start. Maybe something wrong with the newer WUs...
[Nov 30, 2006 9:32:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Recovery from checkpoints does not always work

i have never had any problems with boinc 5.4.11

if i exit wcg cancer, there is a delay while the workunit gets 're-started', but i have never lost more than 5 % after it is completely back on.

are you all giving it a few minutes to get fully re-started?
[Nov 30, 2006 11:33:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Recovery from checkpoints does not always work

It is essential to first snooze BOiNC. Close BOiNC and let time to write there is to write to disk, then shut down. Experimenting needs to be done whether the 'keep in memory' option in the prefs should be on or off..... not all PC's respond the same.

Also the latest HDC versions retain the CPU time spend till shutdown, even when loosing the progress due return to a previous checkpoint. This increases the claim, sometimes works out as becoming an outlier, sometimes helps to bring up the median to get more credit, opposed to having been run in 1 go.

Again, proper hibernation works. Standby will most likely not, as it is not writing all used RAM data parts to disk.

WCG/UD agent is pretty good at it too.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Dec 1, 2006 7:25:15 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Recovery from checkpoints does not always work

As a (probably unwelcome) aside, the 5 day BOINC benchmark also causes WU restarts for HDC.

I have observed that the task seems to be divided into 8, with a change in memory utilisation at every 12.5% multiple. This would seem to be the best time to checkpoint if there is a concern about disk space.

Wish the app developers would throw in a fix for this.....

Jonathan.
[Dec 1, 2006 7:06:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Recovery from checkpoints does not always work

Yes, there used to be only 4 when HDC started, every 25% but the last series have distinct points of progress, seen 37.500, 62.500, 75.000 and 87.500. Not seen 12.500, but assume it's there.

As for the benchmark, that indeed happens, but only if the 'keep in memory' is not switched on. If switched on, there is no loss.

There are checkpoints whenever possible, it's though an elusive bug that during unidentified conflict moments cause progress loss.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Dec 1, 2006 7:22:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 17   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread