Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 7
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 831 times and has 6 replies Next Thread
Nigel Trewartha
Cruncher
Germany
Joined: Jan 7, 2006
Post Count: 10
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Clear Energy 2 does not safe results.

This is basically addressed to the Energy Project 2 developers.


I had the Clear Energy Project 2 running (two tasks each about 23 hours) for about 3 hours but after instlling a new driver I needed to boot the system. Result: the last hours were lost.
I assume the only way to over come this is to keep the computer on line for about 25 hours.

I choose not to leave my computer or Router on during the night
for security reasons.

OK, II have now blocked this project and would like to request
the C.E. 2 developers to save the results every 5 mins or s so.

This a great pity since I would like to support this project.....




BD
[Sep 6, 2014 2:34:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clear Energy 2 does not safe results.

Nigel,

The problem you describe is one of the reasons that CEP2 is opt-in. The work units (WUs) are made up of a number of job steps (at present, typically eight). Checkpoints are only taken between job steps, and the first one can be very long running; there are even instances of machines that have not completed the first step when the job is cancelled by the 18-hour time limit set for this project.

Volunteers have always asked if it is possible for more frequent checkpoints (as all the other WCG projects have) but the techs have always said that it is not feasible due to the nature of the processing required by this project.

That's just the way it is. So if you do not leave your machine(s) on for an extended period then, yes, CEP2 is probably not for you.

If you have a very strong personal affinity with the CEP2 project, and you are willing to lose a lot of time, then go for it -- but you probably need to monitor the WU(s) before you turn your machine off to see if you have any that have not checkpointed since you started your machine. If there are any such, then those WUs are so long running that you probably need to abort them and hope that their replacements will run quicker.

As I said, that's just the way it is.
[Sep 6, 2014 3:11:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
gb009761
Master Cruncher
Scotland
Joined: Apr 6, 2005
Post Count: 2955
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clear Energy 2 does not safe results.

Hi Nigel,, yes, unfortunately what Apis has said, is very true - these CEP2 WU's can (and often do) run for very long periods without a checkpoint.

If you're just shutting your computer/router down at night for security reasons, would it be possible for you to just turn the router off (whilst leaving your computer running - and processing the CEP2 WU's 'offline') thus, remove access to your computer from the Internet?
----------------------------------------

[Sep 6, 2014 4:28:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clear Energy 2 does not safe results.

A very technical solution is running boinc off a ram disk that's snapshot imaged every so many minutes. Then on boot restore the image and continue. Similar to hibernating, but with a little bit of loss. Don't know if vm environments have snapshot functionality. People who run cern's lhc/atlas work might be able to tell, since it requires a vm setup.

If the cep people are interested in increasing the throughput and overcome the checkpoint interval drawback, they may want to invest some time into this and spread the know-how. The agent now also comes with a vm package version included. See http://boinc.berkeley.edu/wiki/VirtualBox which actually gave me the answer to the question with 'VM apps are automatically "restartable". The contents of the VM are written to disk every few minutes, and if your computer is turned off for a while, the application can restart close to where it left off.'

There's so much that can be done, so little that is applied. cool
[Sep 6, 2014 4:59:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Yarensc
Advanced Cruncher
USA
Joined: Sep 24, 2011
Post Count: 134
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clear Energy 2 does not safe results.

If the cep people are interested in increasing the throughput and overcome the checkpoint interval drawback, they may want to invest some time into this and spread the know-how


The problem with this would be the extra memory needed to run the VM on what is already the biggest memory hog on WCG. Well, that and the time it would take for the CEP folk to develop.

It would be nice though, that would bring a lot of people to CEP who can't run it now.
[Sep 7, 2014 6:45:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
DadX
Advanced Cruncher
Joined: Sep 9, 2006
Post Count: 56
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clear Energy 2 does not safe results.

lavaflow,
Do you know of free (or inexpensive), reliable and supported ramdisk software for Windows 7 and later?
----------------------------------------

[Sep 8, 2014 5:10:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Jim1348
Veteran Cruncher
USA
Joined: Jul 13, 2009
Post Count: 1066
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clear Energy 2 does not safe results.

A very technical solution is running boinc off a ram disk that's snapshot imaged every so many minutes. Then on boot restore the image and continue. Similar to hibernating, but with a little bit of loss. Don't know if vm environments have snapshot functionality. People who run cern's lhc/atlas work might be able to tell, since it requires a vm setup.

If you use an SSD, the problem is that CEP2 has a very high write-rate; over 1TB/day for 8 cores of an Ivy Bridge for example. I use a ramdisk to protect my SSD, and taking a snapshot image every few minutes would defeat the purpose of that. Probably twice an hour would not be bad, if you are running CEP2 on only a single core.

The least expensive ramdisk I know of that works (a lot of the free ones don't) is the one by Dataram. You could get along with their free version (4GB or less) if you run maybe 4 or fewer cores on CEP2. I like Primo Ramdisk by Romex Software a little better, but it costs more.

NOTE1: I should have also pointed out that either of the above ramdisks can save their state to disk when you shut down, so you don't really need to do a periodic save at all.

NOTE2: However, the ramdisk will only save whatever you put on there (i.e., the BOINC folder), but will not save the state of the working program memory, and so will not solve the checkpointing problem. The Hibernate function might do that, but I have not checked it.
----------------------------------------
[Edit 3 times, last edit by Jim1348 at Sep 16, 2014 2:46:54 PM]
[Sep 8, 2014 7:26:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread