Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 117
|
![]() |
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi Lawrence.
For those of us that have to shut down the rigs overnight can't avoid this problem, as it seems that a cold restart is what is causing these tasks to go invalid. Plus if you get some of the longer ones ( 6+ hrs ) that are around, you can't do anything about it. The shorter ones that are around 2hrs + always validate for me, as long as the wingman has a good result. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Thanks for the info Lawrence. Hopefully it will get ironed out soon.
Crunch on........................... |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7655 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I also have hosts which run 24/7 and a number of the units show the message:
----------------------------------------12:42:38 (17202): No heartbeat from client for 30 sec - exiting 12:42:38 (17202): timer handler: client dead, exiting. Some become valid and some don't. The message, for me anyway, has nothing to do with rebooting. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
rbotterb
Senior Cruncher United States Joined: Jul 21, 2005 Post Count: 401 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'm not someone running 24/7 with my laptop - maybe 12+ hours plus but I generally shutdown at night. I've only had 2 or so MCM1 WUs that have ended up invalid. The rest of the WUs I've been crunching the last few weeks seem to complete OK.
I'm running on a Win 7 machine - 4 core. Maybe this issue with lost of invalids are more tied to some operating systems than others.... |
||
|
Mumak
Senior Cruncher Joined: Dec 7, 2012 Post Count: 477 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I can too confirm that there's something rotten here. I have restarted one machine and WUs that are returned now are moved to Pending Verification:
----------------------------------------MCM1_ 0000306_ 7173_ 0-- https://secure.worldcommunitygrid.org/ms/devi...s.do?workunitId=908188916 MCM1_ 0000306_ 7138_ 1-- https://secure.worldcommunitygrid.org/ms/devi...s.do?workunitId=908188907 MCM1_ 0000306_ 1890_ 0-- https://secure.worldcommunitygrid.org/ms/devi...s.do?workunitId=908189870 MCM1_ 0000306_ 5474_ 1-- https://secure.worldcommunitygrid.org/ms/devi...s.do?workunitId=908192594 So this is probably the beginning of the path to h***... *edited to appropriate forum content - ErikaT ![]() [Edit 3 times, last edit by ErikaT at Dec 2, 2013 1:05:51 PM] |
||
|
Steve W
Advanced Cruncher Joined: Dec 9, 2005 Post Count: 110 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'm sure that this is something the techs will look into when they get back from their long Thanksgiving weekend.
Not being able to stop a MCM work unit for fear of it going to invalid could well put a number of people off this project, and I know a number of people are already opting out until this is fixed. I've only had one invalid result that I know of so far and it was on a short WU, but even so that is 2 hours that could be run on a reliable project. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi Steve W,
This only happens 'some' of the time. I have not had any problems since the initial beta. Other members report problems when they have to restart from a check point. Why? Under what conditions? We will eventually find out. I can't help since my computer is not having any trouble. Lawrence |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Had a power-out last week Friday, which caused -all- 6 MCM running on non-battery back-up devices to go via the PVal > PVer > Invalid path. Seeing the plummet in the project average hourly credit [from 25 to 22], it implies there are a relative high number that only get the 50% points of quorum. An address of the matter is overdue [what's wrong with a simple ack, dear techs?]... we've been tossing this around on the forum now for a good 11 days. First report Nov.21: http://www.worldcommunitygrid.org/forums/wcg/...ead,35851_offset,0#439815
----------------------------------------Seeing a comment by rbotterb, if you can hibernate/sleep the device instead of shutdown, you avoid the restart issue [provided LAIM, Leave application in memory is on, though it's not required for hibernation... this takes an in-situ memory snapshot and stores it to disk and reloads on power-up]. My 2 Win7 are very good at this. Tested and no resume issue... MCMs go valid. edit: spell [Edit 1 times, last edit by Former Member at Dec 2, 2013 10:11:36 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Whilst, more members may have been resorting to doing the triple M way [Micro Management Methodology] as a new total years for the day record was set at 213.4, AND credit per hour mean rebounded... 23.61 yesterday.
Myself, I've reshuffled profiles, and running 4 MCM next to 4 CEP2 on a battery-backup supported device in a VBox version of BOINC [7.2.33, personally, -not- recommended]. Not as efficient, but a whole lot more useful than generating invalid results. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I only run MCM now on 27/7 machines, the other machines that get switched off or prefs with Suspend are on FAAH
|
||
|
|
![]() |