Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: The Clean Energy Project - Phase 2 Forum Thread: Best response to local computer disaster is???? |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 6
|
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I lost a disk in a striped SSD RAID set - had to restore from image to get back up. Since that box was running 7 active tasks and had the equivalent number "waiting to start", they all went "back in time".
So I "Reset Project". a) Proper behaviour? It returned me a bunch of results in "detached" status...perhaps I should have allowed the restored-from-image tasks to complete and just taken the "Too lates" (or whatever)? b) Anything else I should have done (besides apologize profusely to wingmen)? |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello ibsteve2u,
The answer is a. You told the server you would not be running the obsolete work units, dumped them without wasting computer time, and got started on new work units. This is exactly the sort of situation that 'Reset Project' is supposed to deal with. Please award yourself a gold star! Lawrence |
||
|
kffitzgerald
Senior Cruncher USA Joined: Jan 29, 2011 Post Count: 222 Status: Offline Project Badges: |
if you are going to use raid it would be better to use a striped set with parity (raid5) granted it uses an additional drive BUT in your case all you would have had to do is replace the dead drive with no restore required. and no data would have been lost/delayed.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello ibsteve2u, The answer is a. You told the server you would not be running the obsolete work units, dumped them without wasting computer time, and got started on new work units. This is exactly the sort of situation that 'Reset Project' is supposed to deal with. Please award yourself a gold star! Lawrence Thanks! That is reassuring, especially as I have been on both the giving and receiving end of somebody pushes "a button" here, and it causes a " " way over there. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
if you are going to use raid it would be better to use a striped set with parity (raid5) granted it uses an additional drive BUT in your case all you would have had to do is replace the dead drive with no restore required. and no data would have been lost/delayed. IMO, it should not have been an issue...I was somewhat surprised when the Microsoft backup software in Windows 7 Ultimate x64 told me that I could not skip restore of the 4-disk RAID 10 data set where BOINC/WCG is running if I wanted to restore a system image to the 2-disk RAID 0 system disk where the O/S resides. In hindsight, I presume Microsoft defines a "system image" to include programs and/or the pagefile and/or temporary/scratch directories, all of which I had either split off or pushed off entirely onto the data set to save space and reduce I/Os on the SSD stripe set. I conclude that it was rather rude of Intel to limit the number of SATA ports on the ICH10 to six when it is apparently obvious to anyone that you need at least seven. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi,
----------------------------------------Each client-server connect sets a counter ** each successful handshake, on both sides, and matches what's on the server with what's on the client. When you restored, you went to a state the servers were not in agreement with. Even if you had not reset the project and ran those jobs to the end, things would not have been in sync on first reconnect, with very probable wasted time as the resultant, i.e. you did well to just move on. The moment the *detached* occurred, the task will have been reassigned in rush mode i.e. the wingman would be seeing a new partner reporting within 48 hours, most often quicker and maybe even sooner depending how big your lost buffer was ;>) ** This is the counter value: <rpc_seqno>14814</rpc_seqno> --//-- [Edit 1 times, last edit by Former Member at Sep 16, 2011 11:29:12 AM] |
||
|
|