Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 117
|
![]() |
Author |
|
Steve W
Advanced Cruncher Joined: Dec 9, 2005 Post Count: 110 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I only run MCM now on 27/7 machines... I wish I could run my machines 27 hours a day ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
He's in England... they're off this planet anyhow, but, you only need to increase your CPU clock speed by 12.5% to do 27 hours of work in 24. ;P
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
He's in England... they're off this planet anyhow Pfffffffffffffffffffffffffffft! ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi.
Maybe something has been changed on the validator today, I checked my returned tasks from this morning that had run times over 9hrs & had been restarted this morning. And they had validated with the wing person ![]() ![]() |
||
|
rbotterb
Senior Cruncher United States Joined: Jul 21, 2005 Post Count: 401 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
My MCM1 WUs the last couple days have been ranging from 5.3 to almost 16 hrs in execution - works OK for my laptop though I can't run them on my other two family PCs since they don't get used enough to finish 10+ hr WUs in a given 10 day period.
I noticed this morning one of my MCM1 WU in PV mode had a replication of 4 and a quorum of 2. I wonder if that is a sign that this project will be going to larger replication groups in the future. If so, we'll need to get used to longer PV lists as we all become wingmen for each other..... |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Not probable [that would be 3 steps back to the very beginning when WCG ran BOINC with 4 copies, quorum 3]. Could be extra copies were pushed to expedite validation on the reticent tasks. Post a copy of the quorum detail which will show the 'sent' timestamps and if they all went within the hour or if the extra copies were generated later. The replication number just increments with each copy. Here's one of mine with 3:
----------------------------------------Project Name: Mapping Cancer Markers Created: 11/25/2013 02:02:20 Name: MCM1_0000310_4987 Minimum Quorum: 2 Replication: 3 Result Name App Version Number Status Sent Time Time Due / Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit MCM1_ 0000310_ 4987_ 2-- - In Progress 12/3/13 08:53:06 12/5/13 08:53:06 0.00 0.0 / 0.0 MCM1_ 0000310_ 4987_ 0-- 726 Pending Verification 11/26/13 08:52:44 11/28/13 04:57:27 3.90 115.0 / 0.0 MCM1_ 0000310_ 4987_ 1-- 726 Pending Verification 11/26/13 08:52:35 12/3/13 08:52:57 7.95 163.6 / 0.0 And here 1 with 4 Project Name: Mapping Cancer Markers Created: 11/27/2013 14:17:10 Name: MCM1_0000151_5156 Minimum Quorum: 2 Replication: 4 Result Name App Version Number Status Sent Time Time Due / Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit MCM1_ 0000151_ 5156_ 3-- - In Progress 12/3/13 07:13:24 12/5/13 07:13:24 0.00 0.0 / 0.0 MCM1_ 0000151_ 5156_ 2-- 726 Pending Verification 11/29/13 09:19:11 12/3/13 07:13:10 3.89 75.2 / 0.0 MCM1_ 0000151_ 5156_ 0-- 726 Pending Verification 11/28/13 14:00:36 11/29/13 09:18:43 7.45 127.7 / 0.0 MCM1_ 0000151_ 5156_ 1-- 726 Pending Verification 11/28/13 14:00:31 11/29/13 05:03:26 2.96 57.2 / 0.0 And to bust the rule, 3 copies, but a replication of 2 [suspect errors without are not counted]: Project Name: Mapping Cancer Markers Created: 11/22/2013 02:56:34 Name: MCM1_0000219_7097 Minimum Quorum: 2 Replication: 2 Result Name App Version Number Status Sent Time Time Due / Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit MCM1_ 0000219_ 7097_ 2-- - In Progress 12/2/13 20:22:03 12/4/13 20:22:03 0.00 0.0 / 0.0 MCM1_ 0000219_ 7097_ 1-- 726 Error 11/22/13 20:21:40 12/2/13 20:22:14 0.00 0.0 / 0.0 MCM1_ 0000219_ 7097_ 0-- 726 Pending Validation 11/22/13 20:21:28 11/26/13 10:10:57 15.01 380.0 / 0.0 ATM all the MCM tasks I have [5 on 1 UPS supported device] are repair/verification jobs (running low cache simply because when there's a new app to come, the device can be switched quickest [autonomously] to running the new version. edit: As you may notice, these were all 'created' further back. [Edit 1 times, last edit by Former Member at Dec 3, 2013 2:52:29 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
To continue on, one of the 5 'repair' got a server aborted, as the no reply turned up [late, but not too late]:
Workunit Status Project Name: Mapping Cancer Markers Created: 11/22/2013 14:50:44 Name: MCM1_0000234_5149 Minimum Quorum: 2 Replication: 2 MCM1_ 0000234_ 5149_ 2-- 726 Server Aborted 12/3/13 14:52:33 12/3/13 15:31:26 0.00 0.0 / 0.0 MCM1_ 0000234_ 5149_ 0-- 726 Valid 11/23/13 14:52:10 12/3/13 15:11:59 2.50 37.8 / 66.6 MCM1_ 0000234_ 5149_ 1-- 726 Valid 11/23/13 14:52:01 11/24/13 09:54:39 2.22 95.5 / 66.6 Actual brand new work, just received from batch 0000354 is normal repro 2. Workunit Status Project Name: Mapping Cancer Markers Created: 12/02/2013 01:09:11 Name: MCM1_0000354_5513 Minimum Quorum: 2 Replication: 2 Result Name App Version Number Status Sent Time Time Due / Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit MCM1_ 0000354_ 5513_ 0-- - In Progress 12/3/13 16:14:41 12/13/13 16:14:41 0.00 0.0 / 0.0 MCM1_ 0000354_ 5513_ 1-- - In Progress 12/3/13 16:14:36 12/13/13 16:14:36 0.00 0.0 / 0.0 Think this puts the case to rests... just quorum 2 by default for MCM. Doing anything higher at start would be bluntly silly [except for Beta/Rush batch testing]. Off-line crunchers would not be reachable for server aborts leading to excess copies being computed. Could make a case that Beta should [a hateful word] only be send to devices with a frequent connect history |
||
|
ZipSpeed
Cruncher Joined: Feb 16, 2011 Post Count: 33 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Man, I should have looked in the forums first to figure out why many of my tasks were going invalid! Spent days stress testing my rigs looking for instability and didn't think that it could be software related. Oh well, at least I know my rigs are stable.
----------------------------------------![]() [Edit 1 times, last edit by ZipSpeed at Dec 4, 2013 3:07:18 PM] |
||
|
verheyde
Cruncher Belgium Joined: Dec 7, 2004 Post Count: 25 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
If a cold restart is causing the problem, then I wonder why I only get a few invalids at a time. Not all the tasks running when my machine has to be restarted get to the invalid status. The processor is an Intel i7, which means that 8 threads are available. In that case I would expect the 8 MCM tasks to suffer in the same way. But I only see a few invalids.
And as I restart infrequently, I would also expect to only see the invalid issue when the machine actually had to be restarted, which is not the case. I see a steady "stream" of invalids, even if my machine was not re-booted. (I suspend the machine when moving between locations. I never hibernate it.). During weekends my machine does not get suspended and still I have the impression I see invalids. I hope the techs find out the logic behind the failure soon. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Have you got LAIM checked when you suspend?
|
||
|
|
![]() |