Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: The Clean Energy Project - Phase 2 Forum Thread: You have to be kidding me... |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 19
|
Author |
|
jonnieb-uk
Ace Cruncher England Joined: Nov 30, 2011 Post Count: 6105 Status: Offline Project Badges: |
After a work unit is in 'error' status a computer has its 'trusted' status removed and its work units must be checked by a wingman. Any subsequent work units returned will be put in Pver status for this check. Further work units sent to the computer will also be sent to a wingman. This will continue until 'trusted' status is achieved again. . Given the increased incidence of Error and P/Ver when crunching CEP2 can the techs reduce this to the standard 3 deadline for repair work Seems the repair deadline has been changed to 3.5 days ---------------------------------------- [Edit 1 times, last edit by jonnieb-uk at Aug 4, 2014 12:25:15 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
'Trusted' or 'reliable' is at science app level and is maintained by always having the last 20+ serially rated with valid. This includes results from before a problem occurred that had been waiting on a wingman.
Regarding a comment of results having faultily gone to error, then after re-validation went valid, all those would have counted against the 20. Sadly though, those that already gone out up front with a wingman still go to waste in this respect, no retroactive reset. How many days, months or years worth of computing time went to the bin this way is the jackpot question. Repairs have been for a longer time at 35 percent of the original deadline, posted by probably keithing reed, former technician, like here . The 30 percent was only briefly. And still today we're waiting on repairs getting at least the same deadline date as the original. At the ministry of silly walks, repairs most of the time are due before the original. With initial distribution of 2 you can have 1 with a 10 day deadline and the repair that went out the next day for a wingman fail due in 3.5 days, net the repairs are very often waiting on the original. |
||
|
Thyme Lawn
Cruncher Joined: Dec 9, 2008 Post Count: 46 Status: Offline Project Badges: |
I've had a series of E224* tasks which have failed with "RC = 0xc0000005" in job 1 and skipped jobs 2 to 15.
----------------------------------------[22:26:44] Number of jobs = 16I returned one of these tasks at 06:50:43 on 1st August which is PV, and tasks with the same processing pattern returned earlier than that were being validated. That seems to have changed since the validator was modified. I've returned 2 similarly afflicted tasks today which were both marked as error and have just downloaded an E224*_6 task which, based on the 6 preceding failures, I'm sure will go the same way. If the change is due to the validator update I guess the wingman for my PV task will be marked as an error after it's reported.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
|
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: |
I have changed the total number of errors for cep2 down. We should not have 9 copies sent out again.
Thanks, -Uplinger |
||
|
cjslman
Master Cruncher Mexico Joined: Nov 23, 2004 Post Count: 2082 Status: Offline Project Badges: |
I have two more CEP2 WUs that seem to be kaput. One has already plowed through it's 10 victims and the other one is getting started:
----------------------------------------E224268_ 617_ I.68.C54H29N5O9.00404372.1.set1d06_ 8-- 640 Error 8/5/14 10:00:09 8/5/14 15:17:57 0.90 30.7 / 30.7 E224268_ 617_ I.68.C54H29N5O9.00404372.1.set1d06_ 9-- 640 Error 8/5/14 09:55:31 8/5/14 15:38:55 2.14 31.0 / 31.0 E224268_ 617_ I.68.C54H29N5O9.00404372.1.set1d06_ 7-- 640 Error 8/4/14 07:59:10 8/5/14 08:10:03 0.85 25.2 / 25.2 E224268_ 617_ I.68.C54H29N5O9.00404372.1.set1d06_ 6-- 640 Error 8/4/14 07:45:53 8/5/14 04:37:51 1.03 53.0 / 53.0 E224268_ 617_ I.68.C54H29N5O9.00404372.1.set1d06_ 5-- 640 Error 8/3/14 08:53:06 8/4/14 07:37:20 1.01 31.9 / 31.9 E224268_ 617_ I.68.C54H29N5O9.00404372.1.set1d06_ 4-- 640 Error 8/3/14 08:52:38 8/3/14 13:12:26 0.69 24.5 / 24.5 E224268_ 617_ I.68.C54H29N5O9.00404372.1.set1d06_ 3-- 640 Error 8/2/14 20:21:55 8/3/14 08:43:59 1.51 27.3 / 27.3 <-me E224268_ 617_ I.68.C54H29N5O9.00404372.1.set1d06_ 2-- 640 Error 8/2/14 14:12:09 8/2/14 16:03:59 0.89 30.5 / 30.5 E224268_ 617_ I.68.C54H29N5O9.00404372.1.set1d06_ 1-- 640 Error 8/2/14 14:02:35 8/3/14 02:33:12 0.78 37.6 / 37.6 E224268_ 617_ I.68.C54H29N5O9.00404372.1.set1d06_ 0-- 640 Error 8/1/14 15:39:15 8/2/14 14:00:08 1.15 44.1 / 44.1 E224265_ 958_ I.68.C54H28N4O10.00241615.1.set1d06_ 4-- - In Progress 8/5/14 23:16:59 8/15/14 23:16:59 0.00 0.0 / 0.0 E224265_ 958_ I.68.C54H28N4O10.00241615.1.set1d06_ 3-- 640 Error 8/5/14 18:37:46 8/5/14 23:11:10 0.00 0.0 / 0.0 E224265_ 958_ I.68.C54H28N4O10.00241615.1.set1d06_ 2-- - In Progress 8/5/14 18:31:19 8/15/14 18:31:19 0.00 0.0 / 0.0 E224265_ 958_ I.68.C54H28N4O10.00241615.1.set1d06_ 1-- 640 Error 8/1/14 15:49:16 8/1/14 19:45:36 1.58 28.4 / 0.0 <-me E224265_ 958_ I.68.C54H28N4O10.00241615.1.set1d06_ 0-- 640 Error 8/1/14 15:40:00 8/5/14 15:35:34 1.10 38.3 / 0.0 I don't see any errors in the Results Log: Result Log Result Name: E224268_ 617_ I.68.C54H29N5O9.00404372.1.set1d06_ 3-- <core_client_version>7.2.47</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [00:58:50] Number of jobs = 16 [00:58:50] Starting job 0,CPU time has been restored to 0.000000. [01:20:51] Finished Job #0 [01:20:51] Starting job 1,CPU time has been restored to 1141.896120. Application exited with RC = 0xc0000005 [02:42:45] Finished Job #1 [02:42:45] Starting job 2,CPU time has been restored to 5430.597611. [02:42:45] Skipping Job #2 [02:42:45] Starting job 3,CPU time has been restored to 5430.597611. [02:42:45] Skipping Job #3 [02:42:45] Starting job 4,CPU time has been restored to 5430.597611. [02:42:45] Skipping Job #4 [02:42:45] Starting job 5,CPU time has been restored to 5430.597611. [02:42:45] Skipping Job #5 [02:42:45] Starting job 6,CPU time has been restored to 5430.597611. [02:42:45] Skipping Job #6 [02:42:45] Starting job 7,CPU time has been restored to 5430.597611. [02:42:45] Skipping Job #7 [02:42:45] Starting job 8,CPU time has been restored to 5430.597611. [02:42:45] Skipping Job #8 [02:42:45] Starting job 9,CPU time has been restored to 5430.597611. [02:42:45] Skipping Job #9 [02:42:45] Starting job 10,CPU time has been restored to 5430.597611. [02:42:45] Skipping Job #10 [02:42:45] Starting job 11,CPU time has been restored to 5430.597611. [02:42:45] Skipping Job #11 [02:42:45] Starting job 12,CPU time has been restored to 5430.597611. [02:42:45] Skipping Job #12 [02:42:45] Starting job 13,CPU time has been restored to 5430.597611. [02:42:45] Skipping Job #13 [02:42:45] Starting job 14,CPU time has been restored to 5430.597611. [02:42:45] Skipping Job #14 [02:42:45] Starting job 15,CPU time has been restored to 5430.597611. [02:42:45] Skipping Job #15 02:42:49 (2564): called boinc_finish </stderr_txt> ]]> Posted all the above if it's any help to anybody. CJSL Crunching for a brighter tomorrow... |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I got one, I'm the tenth to run it all others erred.
It only ran one job skipped the rest see below. E224270_ 215_ I.66.C55H35N5O6.00310371.1.set1d06_ 9-- 640 Pending Validation 8/6/14 03:30:43 8/6/14 05:37:19 0.71 37.4 / 0.0 Result Log Result Name: E224270_ 215_ I.66.C55H35N5O6.00310371.1.set1d06_ 9-- <core_client_version>7.0.27</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [14:27:57] Number of jobs = 16 [14:27:57] Starting job 0,CPU time has been restored to 0.000000. [14:28:00] Starting new Job [14:28:01] Qink name = fldman [14:28:02] Qink name = gesman [14:28:02] Qink name = scfman [14:41:02] Qink name = anlman [14:41:20] End of Job [14:41:21] Finished Job #0 [14:41:21] Starting job 1,CPU time has been restored to 738.748000. [14:41:21] Starting new Job [14:41:21] Qink name = fldman [14:41:24] Qink name = gesman [14:41:25] Qink name = scfman [15:03:11] Qink name = anlman Application exited with RC = 0x8b [15:12:11] Finished Job #1 [15:12:11] Starting job 2,CPU time has been restored to 2478.512000. [15:12:11] Skipping Job #2 [15:12:11] Starting job 3,CPU time has been restored to 2478.512000. |
||
|
jonnieb-uk
Ace Cruncher England Joined: Nov 30, 2011 Post Count: 6105 Status: Offline Project Badges: |
It seems that the deadline for CEP2 Repair work has been moved back out to 10 days rather than 35%
----------------------------------------E225052_ 949_ S.252.C31H23N5O1.XKLYIVBOTGFSMG-UHFFFAOYSA-N.4_ s1_ 14_ 1-- - In Progress 06/08/14 21:25:13 16/08/14 21:25:13 0.00 0.0 / 0.0 E225052_ 949_ S.252.C31H23N5O1.XKLYIVBOTGFSMG-UHFFFAOYSA-N.4_ s1_ 14_ 0-- 640 Pending Verification 05/08/14 20:46:52 06/08/14 21:15:07 6.72 261.0 / 0.0 |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: |
jonnieb,
You are correct. The jobs sent out right now do not have the reliable setting to them. We are still working through member computers with not being reliable from the validation issues last week. This is what was causing CEP2 to appear out of work when it actually wasn't. I thought we had cleared them the other day, then got bitten by them again, thus users seeing no work available. I'm going to let this run for a bit to hopefully get more reliable hosts for CEP2. Thanks, -Uplinger |
||
|
jonnieb-uk
Ace Cruncher England Joined: Nov 30, 2011 Post Count: 6105 Status: Offline Project Badges: |
Keith
----------------------------------------Thanks for the explanation |
||
|
|