Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 6
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 375 times and has 5 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Best response to local computer disaster is????

I lost a disk in a striped SSD RAID set - had to restore from image to get back up. Since that box was running 7 active tasks and had the equivalent number "waiting to start", they all went "back in time".

So I "Reset Project".

a) Proper behaviour? It returned me a bunch of results in "detached" status...perhaps I should have allowed the restored-from-image tasks to complete and just taken the "Too lates" (or whatever)?

b) Anything else I should have done (besides apologize profusely to wingmen)?
[Sep 15, 2011 12:29:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Best response to local computer disaster is????

Hello ibsteve2u,
The answer is a. You told the server you would not be running the obsolete work units, dumped them without wasting computer time, and got started on new work units. This is exactly the sort of situation that 'Reset Project' is supposed to deal with.

Please award yourself a gold star!

biggrin
Lawrence
[Sep 15, 2011 10:10:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
kffitzgerald
Senior Cruncher
USA
Joined: Jan 29, 2011
Post Count: 222
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Best response to local computer disaster is????

if you are going to use raid it would be better to use a striped set with parity (raid5) granted it uses an additional drive BUT in your case all you would have had to do is replace the dead drive with no restore required. and no data would have been lost/delayed.
[Sep 16, 2011 10:41:10 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Best response to local computer disaster is????

Hello ibsteve2u,
The answer is a. You told the server you would not be running the obsolete work units, dumped them without wasting computer time, and got started on new work units. This is exactly the sort of situation that 'Reset Project' is supposed to deal with.

Please award yourself a gold star!

biggrin
Lawrence

Thanks! That is reassuring, especially as I have been on both the giving and receiving end of somebody pushes "a button" here, and it causes a " crying " way over there.
[Sep 16, 2011 11:05:00 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Best response to local computer disaster is????

if you are going to use raid it would be better to use a striped set with parity (raid5) granted it uses an additional drive BUT in your case all you would have had to do is replace the dead drive with no restore required. and no data would have been lost/delayed.

IMO, it should not have been an issue...I was somewhat surprised when the Microsoft backup software in Windows 7 Ultimate x64 told me that I could not skip restore of the 4-disk RAID 10 data set where BOINC/WCG is running if I wanted to restore a system image to the 2-disk RAID 0 system disk where the O/S resides.

In hindsight, I presume Microsoft defines a "system image" to include programs and/or the pagefile and/or temporary/scratch directories, all of which I had either split off or pushed off entirely onto the data set to save space and reduce I/Os on the SSD stripe set.

I conclude that it was rather rude of Intel to limit the number of SATA ports on the ICH10 to six when it is apparently obvious to anyone that you need at least seven.
[Sep 16, 2011 11:19:40 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Best response to local computer disaster is????

Hi,

Each client-server connect sets a counter ** each successful handshake, on both sides, and matches what's on the server with what's on the client. When you restored, you went to a state the servers were not in agreement with. Even if you had not reset the project and ran those jobs to the end, things would not have been in sync on first reconnect, with very probable wasted time as the resultant, i.e. you did well to just move on.

The moment the *detached* occurred, the task will have been reassigned in rush mode i.e. the wingman would be seeing a new partner reporting within 48 hours, most often quicker and maybe even sooner depending how big your lost buffer was ;>)

** This is the counter value: <rpc_seqno>14814</rpc_seqno>

--//--
----------------------------------------
[Edit 1 times, last edit by Former Member at Sep 16, 2011 11:29:12 AM]
[Sep 16, 2011 11:27:58 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread