Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: The Clean Energy Project - Phase 2 Forum Thread: Clear Energy 2 does not safe results. |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 7
|
Author |
|
Nigel Trewartha
Cruncher Germany Joined: Jan 7, 2006 Post Count: 10 Status: Offline Project Badges: |
This is basically addressed to the Energy Project 2 developers.
I had the Clear Energy Project 2 running (two tasks each about 23 hours) for about 3 hours but after instlling a new driver I needed to boot the system. Result: the last hours were lost. I assume the only way to over come this is to keep the computer on line for about 25 hours. I choose not to leave my computer or Router on during the night for security reasons. OK, II have now blocked this project and would like to request the C.E. 2 developers to save the results every 5 mins or s so. This a great pity since I would like to support this project..... BD |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Nigel,
The problem you describe is one of the reasons that CEP2 is opt-in. The work units (WUs) are made up of a number of job steps (at present, typically eight). Checkpoints are only taken between job steps, and the first one can be very long running; there are even instances of machines that have not completed the first step when the job is cancelled by the 18-hour time limit set for this project. Volunteers have always asked if it is possible for more frequent checkpoints (as all the other WCG projects have) but the techs have always said that it is not feasible due to the nature of the processing required by this project. That's just the way it is. So if you do not leave your machine(s) on for an extended period then, yes, CEP2 is probably not for you. If you have a very strong personal affinity with the CEP2 project, and you are willing to lose a lot of time, then go for it -- but you probably need to monitor the WU(s) before you turn your machine off to see if you have any that have not checkpointed since you started your machine. If there are any such, then those WUs are so long running that you probably need to abort them and hope that their replacements will run quicker. As I said, that's just the way it is. |
||
|
gb009761
Master Cruncher Scotland Joined: Apr 6, 2005 Post Count: 2955 Status: Offline Project Badges: |
Hi Nigel,, yes, unfortunately what Apis has said, is very true - these CEP2 WU's can (and often do) run for very long periods without a checkpoint.
----------------------------------------If you're just shutting your computer/router down at night for security reasons, would it be possible for you to just turn the router off (whilst leaving your computer running - and processing the CEP2 WU's 'offline') thus, remove access to your computer from the Internet? |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
A very technical solution is running boinc off a ram disk that's snapshot imaged every so many minutes. Then on boot restore the image and continue. Similar to hibernating, but with a little bit of loss. Don't know if vm environments have snapshot functionality. People who run cern's lhc/atlas work might be able to tell, since it requires a vm setup.
If the cep people are interested in increasing the throughput and overcome the checkpoint interval drawback, they may want to invest some time into this and spread the know-how. The agent now also comes with a vm package version included. See http://boinc.berkeley.edu/wiki/VirtualBox which actually gave me the answer to the question with 'VM apps are automatically "restartable". The contents of the VM are written to disk every few minutes, and if your computer is turned off for a while, the application can restart close to where it left off.' There's so much that can be done, so little that is applied. |
||
|
Yarensc
Advanced Cruncher USA Joined: Sep 24, 2011 Post Count: 134 Status: Offline Project Badges: |
If the cep people are interested in increasing the throughput and overcome the checkpoint interval drawback, they may want to invest some time into this and spread the know-how The problem with this would be the extra memory needed to run the VM on what is already the biggest memory hog on WCG. Well, that and the time it would take for the CEP folk to develop. It would be nice though, that would bring a lot of people to CEP who can't run it now. |
||
|
DadX
Advanced Cruncher Joined: Sep 9, 2006 Post Count: 56 Status: Offline Project Badges: |
lavaflow,
----------------------------------------Do you know of free (or inexpensive), reliable and supported ramdisk software for Windows 7 and later? |
||
|
Jim1348
Veteran Cruncher USA Joined: Jul 13, 2009 Post Count: 1066 Status: Offline Project Badges: |
A very technical solution is running boinc off a ram disk that's snapshot imaged every so many minutes. Then on boot restore the image and continue. Similar to hibernating, but with a little bit of loss. Don't know if vm environments have snapshot functionality. People who run cern's lhc/atlas work might be able to tell, since it requires a vm setup. If you use an SSD, the problem is that CEP2 has a very high write-rate; over 1TB/day for 8 cores of an Ivy Bridge for example. I use a ramdisk to protect my SSD, and taking a snapshot image every few minutes would defeat the purpose of that. Probably twice an hour would not be bad, if you are running CEP2 on only a single core. The least expensive ramdisk I know of that works (a lot of the free ones don't) is the one by Dataram. You could get along with their free version (4GB or less) if you run maybe 4 or fewer cores on CEP2. I like Primo Ramdisk by Romex Software a little better, but it costs more. NOTE1: I should have also pointed out that either of the above ramdisks can save their state to disk when you shut down, so you don't really need to do a periodic save at all. NOTE2: However, the ramdisk will only save whatever you put on there (i.e., the BOINC folder), but will not save the state of the working program memory, and so will not solve the checkpointing problem. The Hibernate function might do that, but I have not checked it. [Edit 3 times, last edit by Jim1348 at Sep 16, 2014 2:46:54 PM] |
||
|
|