Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 19
Posts: 19   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1601 times and has 18 replies Next Thread
jonnieb-uk
Ace Cruncher
England
Joined: Nov 30, 2011
Post Count: 6105
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: You have to be kidding me...

After a work unit is in 'error' status a computer has its 'trusted' status removed and its work units must be checked by a wingman. Any subsequent work units returned will be put in Pver status for this check. Further work units sent to the computer will also be sent to a wingman. This will continue until 'trusted' status is achieved again.
.


And the WUs sent out for checking by a wingman have a 10 day deadline!

Given the increased incidence of Error and P/Ver when crunching CEP2 can the techs reduce this to the standard 3 deadline for repair work


Seems the repair deadline has been changed to 3.5 days smile
----------------------------------------

To Join follow this link: Join the UK Team All Welcome! UK Team thread
----------------------------------------
[Edit 1 times, last edit by jonnieb-uk at Aug 4, 2014 12:25:15 PM]
[Aug 4, 2014 11:38:42 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: You have to be kidding me...

'Trusted' or 'reliable' is at science app level and is maintained by always having the last 20+ serially rated with valid. This includes results from before a problem occurred that had been waiting on a wingman.

Regarding a comment of results having faultily gone to error, then after re-validation went valid, all those would have counted against the 20. Sadly though, those that already gone out up front with a wingman still go to waste in this respect, no retroactive reset. How many days, months or years worth of computing time went to the bin this way is the jackpot question.

Repairs have been for a longer time at 35 percent of the original deadline, posted by probably keithing reed, former technician, like here . The 30 percent was only briefly. And still today we're waiting on repairs getting at least the same deadline date as the original. At the ministry of silly walks, repairs most of the time are due before the original. With initial distribution of 2 you can have 1 with a 10 day deadline and the repair that went out the next day for a wingman fail due in 3.5 days, net the repairs are very often waiting on the original.
[Aug 4, 2014 3:15:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Thyme Lawn
Cruncher
Joined: Dec 9, 2008
Post Count: 46
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: You have to be kidding me...

I've had a series of E224* tasks which have failed with "RC = 0xc0000005" in job 1 and skipped jobs 2 to 15.
[22:26:44] Number of jobs = 16
[22:26:44] Starting job 0,CPU time has been restored to 0.000000.
[23:18:12] Finished Job #0
[23:18:12] Starting job 1,CPU time has been restored to 805.500000. Application exited with RC = 0xc0000005
[01:42:33] Finished Job #1
[01:42:33] Starting job 2,CPU time has been restored to 4117.640625.
[01:42:33] Skipping Job #2
I returned one of these tasks at 06:50:43 on 1st August which is PV, and tasks with the same processing pattern returned earlier than that were being validated.

That seems to have changed since the validator was modified. I've returned 2 similarly afflicted tasks today which were both marked as error and have just downloaded an E224*_6 task which, based on the 6 preceding failures, I'm sure will go the same way.

If the change is due to the validator update I guess the wingman for my PV task will be marked as an error after it's reported.
----------------------------------------
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
[Aug 4, 2014 9:12:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: You have to be kidding me...

I have changed the total number of errors for cep2 down. We should not have 9 copies sent out again.

Thanks,
-Uplinger
[Aug 6, 2014 1:53:20 AM]   Link   Report threatening or abusive post: please login first  Go to top 
cjslman
Master Cruncher
Mexico
Joined: Nov 23, 2004
Post Count: 2082
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: You have to be kidding me...

I have two more CEP2 WUs that seem to be kaput. One has already plowed through it's 10 victims and the other one is getting started:
E224268_ 617_ I.68.C54H29N5O9.00404372.1.set1d06_ 8-- 640 Error 8/5/14 10:00:09 8/5/14 15:17:57 0.90 30.7 / 30.7
E224268_ 617_ I.68.C54H29N5O9.00404372.1.set1d06_ 9-- 640 Error 8/5/14 09:55:31 8/5/14 15:38:55 2.14 31.0 / 31.0
E224268_ 617_ I.68.C54H29N5O9.00404372.1.set1d06_ 7-- 640 Error 8/4/14 07:59:10 8/5/14 08:10:03 0.85 25.2 / 25.2
E224268_ 617_ I.68.C54H29N5O9.00404372.1.set1d06_ 6-- 640 Error 8/4/14 07:45:53 8/5/14 04:37:51 1.03 53.0 / 53.0
E224268_ 617_ I.68.C54H29N5O9.00404372.1.set1d06_ 5-- 640 Error 8/3/14 08:53:06 8/4/14 07:37:20 1.01 31.9 / 31.9
E224268_ 617_ I.68.C54H29N5O9.00404372.1.set1d06_ 4-- 640 Error 8/3/14 08:52:38 8/3/14 13:12:26 0.69 24.5 / 24.5
E224268_ 617_ I.68.C54H29N5O9.00404372.1.set1d06_ 3-- 640 Error 8/2/14 20:21:55 8/3/14 08:43:59 1.51 27.3 / 27.3 <-me
E224268_ 617_ I.68.C54H29N5O9.00404372.1.set1d06_ 2-- 640 Error 8/2/14 14:12:09 8/2/14 16:03:59 0.89 30.5 / 30.5
E224268_ 617_ I.68.C54H29N5O9.00404372.1.set1d06_ 1-- 640 Error 8/2/14 14:02:35 8/3/14 02:33:12 0.78 37.6 / 37.6
E224268_ 617_ I.68.C54H29N5O9.00404372.1.set1d06_ 0-- 640 Error 8/1/14 15:39:15 8/2/14 14:00:08 1.15 44.1 / 44.1


E224265_ 958_ I.68.C54H28N4O10.00241615.1.set1d06_ 4-- - In Progress 8/5/14 23:16:59 8/15/14 23:16:59 0.00 0.0 / 0.0
E224265_ 958_ I.68.C54H28N4O10.00241615.1.set1d06_ 3-- 640 Error 8/5/14 18:37:46 8/5/14 23:11:10 0.00 0.0 / 0.0
E224265_ 958_ I.68.C54H28N4O10.00241615.1.set1d06_ 2-- - In Progress 8/5/14 18:31:19 8/15/14 18:31:19 0.00 0.0 / 0.0
E224265_ 958_ I.68.C54H28N4O10.00241615.1.set1d06_ 1-- 640 Error 8/1/14 15:49:16 8/1/14 19:45:36 1.58 28.4 / 0.0 <-me
E224265_ 958_ I.68.C54H28N4O10.00241615.1.set1d06_ 0-- 640 Error 8/1/14 15:40:00 8/5/14 15:35:34 1.10 38.3 / 0.0


I don't see any errors in the Results Log:

Result Log

Result Name: E224268_ 617_ I.68.C54H29N5O9.00404372.1.set1d06_ 3--
<core_client_version>7.2.47</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[00:58:50] Number of jobs = 16
[00:58:50] Starting job 0,CPU time has been restored to 0.000000.
[01:20:51] Finished Job #0
[01:20:51] Starting job 1,CPU time has been restored to 1141.896120.
Application exited with RC = 0xc0000005
[02:42:45] Finished Job #1
[02:42:45] Starting job 2,CPU time has been restored to 5430.597611.
[02:42:45] Skipping Job #2
[02:42:45] Starting job 3,CPU time has been restored to 5430.597611.
[02:42:45] Skipping Job #3
[02:42:45] Starting job 4,CPU time has been restored to 5430.597611.
[02:42:45] Skipping Job #4
[02:42:45] Starting job 5,CPU time has been restored to 5430.597611.
[02:42:45] Skipping Job #5
[02:42:45] Starting job 6,CPU time has been restored to 5430.597611.
[02:42:45] Skipping Job #6
[02:42:45] Starting job 7,CPU time has been restored to 5430.597611.
[02:42:45] Skipping Job #7
[02:42:45] Starting job 8,CPU time has been restored to 5430.597611.
[02:42:45] Skipping Job #8
[02:42:45] Starting job 9,CPU time has been restored to 5430.597611.
[02:42:45] Skipping Job #9
[02:42:45] Starting job 10,CPU time has been restored to 5430.597611.
[02:42:45] Skipping Job #10
[02:42:45] Starting job 11,CPU time has been restored to 5430.597611.
[02:42:45] Skipping Job #11
[02:42:45] Starting job 12,CPU time has been restored to 5430.597611.
[02:42:45] Skipping Job #12
[02:42:45] Starting job 13,CPU time has been restored to 5430.597611.
[02:42:45] Skipping Job #13
[02:42:45] Starting job 14,CPU time has been restored to 5430.597611.
[02:42:45] Skipping Job #14
[02:42:45] Starting job 15,CPU time has been restored to 5430.597611.
[02:42:45] Skipping Job #15
02:42:49 (2564): called boinc_finish

</stderr_txt>
]]>


Posted all the above if it's any help to anybody.

CJSL

Crunching for a brighter tomorrow...
----------------------------------------
I follow the Gimli philosophy: "Keep breathing. That's the key. Breathe."
Join The Cahuamos Team


[Aug 6, 2014 2:09:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: You have to be kidding me...

I got one, I'm the tenth to run it all others erred.

It only ran one job skipped the rest see below.

E224270_ 215_ I.66.C55H35N5O6.00310371.1.set1d06_ 9-- 640 Pending Validation 8/6/14 03:30:43 8/6/14 05:37:19 0.71 37.4 / 0.0


Result Log

Result Name: E224270_ 215_ I.66.C55H35N5O6.00310371.1.set1d06_ 9--
<core_client_version>7.0.27</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[14:27:57] Number of jobs = 16
[14:27:57] Starting job 0,CPU time has been restored to 0.000000.
[14:28:00] Starting new Job
[14:28:01] Qink name = fldman
[14:28:02] Qink name = gesman
[14:28:02] Qink name = scfman
[14:41:02] Qink name = anlman
[14:41:20] End of Job
[14:41:21] Finished Job #0
[14:41:21] Starting job 1,CPU time has been restored to 738.748000.
[14:41:21] Starting new Job
[14:41:21] Qink name = fldman
[14:41:24] Qink name = gesman
[14:41:25] Qink name = scfman
[15:03:11] Qink name = anlman
Application exited with RC = 0x8b
[15:12:11] Finished Job #1
[15:12:11] Starting job 2,CPU time has been restored to 2478.512000.
[15:12:11] Skipping Job #2
[15:12:11] Starting job 3,CPU time has been restored to 2478.512000.
[Aug 6, 2014 5:47:39 AM]   Link   Report threatening or abusive post: please login first  Go to top 
jonnieb-uk
Ace Cruncher
England
Joined: Nov 30, 2011
Post Count: 6105
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: You have to be kidding me...

It seems that the deadline for CEP2 Repair work has been moved back out to 10 days rather than 35% confused

E225052_ 949_ S.252.C31H23N5O1.XKLYIVBOTGFSMG-UHFFFAOYSA-N.4_ s1_ 14_ 1--
- In Progress 06/08/14 21:25:13 16/08/14 21:25:13 0.00 0.0 / 0.0
E225052_ 949_ S.252.C31H23N5O1.XKLYIVBOTGFSMG-UHFFFAOYSA-N.4_ s1_ 14_ 0--
640 Pending Verification 05/08/14 20:46:52 06/08/14 21:15:07 6.72 261.0 / 0.0
----------------------------------------

To Join follow this link: Join the UK Team All Welcome! UK Team thread
[Aug 6, 2014 10:58:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: You have to be kidding me...

jonnieb,

You are correct. The jobs sent out right now do not have the reliable setting to them. We are still working through member computers with not being reliable from the validation issues last week. This is what was causing CEP2 to appear out of work when it actually wasn't. I thought we had cleared them the other day, then got bitten by them again, thus users seeing no work available. I'm going to let this run for a bit to hopefully get more reliable hosts for CEP2.

Thanks,
-Uplinger
[Aug 7, 2014 5:06:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
jonnieb-uk
Ace Cruncher
England
Joined: Nov 30, 2011
Post Count: 6105
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: You have to be kidding me...

Keith
Thanks for the explanation smile
----------------------------------------

To Join follow this link: Join the UK Team All Welcome! UK Team thread
[Aug 7, 2014 9:10:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 19   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread