Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 52
|
![]() |
Author |
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Dataman,
----------------------------------------No 10 of the 19 'inconclusive' meet following criteria 1. Have all the same control hash/checknumber 2. Do not all fill the 10 different parts that make up the whole. It's very well possible that several of the same 1/10th segment agree, but the validation program at this time does not look at that.... it needs all 10 parts to be complete before it is able to dismiss the invalids. Somehow, the way i understand it, the algorithm is not able to say e.g. 7 have the same overall hash and 3 don't, therefore only those 3 need a backup calculation. Probabilities would make it an extreme outside for that not to be true. Think knreed explained it somewhere in US-English. Added 2 comments: A. As for the task switching, it's odd within the same project (WCG), but presume that after it determined the deadline will be met (?), it went back to 'which job needs least time to complete' (another logic of BOINC). What was the remaining time on the DDDT when that happened? Watch that if you receive a batch of with exact same deadline (the maximum is 10), BOINC will do the jobs with shortest projected completion times first. B. The initial replication number is i think not correct. It's 10. There's a known bug, where each extra copy adds to that figure. Il Consigliere della Comunità
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 3 times, last edit by Sekerob at Oct 29, 2007 4:56:51 PM] |
||
|
Dataman
Ace Cruncher Joined: Nov 16, 2004 Post Count: 4865 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Dataman, No 10 of the 19 'inconclusive' meet following criteria 1. Have all the same control hash/checknumber 2. Do not all fill the 10 different parts that make up the whole. It's very well possible that several of the same 1/10th segment agree, but the validation program at this time does not look at that.... it needs all 10 parts to be complete before it is able to dismiss the invalids. Somehow, the way i understand it, the algorithm is not able to say e.g. 7 have the same overall hash and 3 don't, therefore only those 3 need a backup calculation. Probabilities would make it an extreme outside for that not to be true. Think knreed explained it somewhere in US-English. Il Consigliere della Comunità Thanks, as usual, Sekerob. That makes sense. It is an "interesting" way of doing things. I'll keep running them. ![]() ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
And another:
ach1_ 5_ 35_ 1--: 10.38 hours |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Two more
![]() ach1_ 3_ 5_ 10--: 10.64 hours ach1_ 6_ 68_ 2--: 9.32 hours |
||
|
Dataman
Ace Cruncher Joined: Nov 16, 2004 Post Count: 4865 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
What was the remaining time on the DDDT when that happened? Watch that if you receive a batch of with exact same deadline (the maximum is 10), BOINC will do the jobs with shortest projected completion times first. Il Consigliere della Comunità Sorry Sek, I missed your addendum. I did not record the time but I am rather sure the DDDT had >75% time remaining. Since then I have run a lot of them and have had no problems with them at all. Cheers! ![]() ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
A couple more:
ach1_ 9_ 67_ 10--: 9.50 hours ach1_ 6_ 26_ 15--: 10.72 hours |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Here are a few more:
ach1_ 9_ 31_ 6-- ach1_ 16_ 79_ 10-- ach1_ 14_ 50_ 28-- ach1_ 12_ 56_ 0-- ach1_ 9_ 15_ 17-- ach1_ 9_ 67_ 10-- |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Here one more:
----------------------------------------WU ach1_14_72 48 copies 45 Too Late 2 Error 1 No Reply [Edit 1 times, last edit by Former Member at Feb 16, 2008 10:46:55 AM] |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
hi h.hett,
----------------------------------------this thread is/was reporting jobs that went out with a bang and errors. so is yours the one of the 'too late' case or the fail? Got 7 in a row after months of infinite repeats that none were available and being send alternate work. 5 validated so far, 1 is pending and 1 is still crunching, so don't quite fathom why they crash so frequently for esoteric17. Do note though that like the cancer jobs they generate massive amounts of page faults (6.5 billion on 1 job) and run very slowly on my what i thought was still a fair system (P4 with 512kb L2 cache). It only improved somewhat when allowing more ram (1.3gb when in use and increasing the "write to disk" to 15 minutes. Found a few strange oddities about the distribution algorithm which i'll report to the technicians. The deadlines for backup work are not consistent for one. The last rush job got finished about 30 minutes before deadline and had to be uploaded manually as the client did not volunteer this. ttyl
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Got 7 in a row after months of infinite repeats that none were available and being send alternate work. 5 validated so far, 1 is pending and 1 is still crunching, so don't quite fathom why they crash so frequently for esoteric17. Not just me! (See post earlier by uplinger saying the same thing happened to him). And I may be giving the wrong impression here - I do have a number of hosts and I have many AC@H which are valid. I'm just noticing them more because statistically I receive more WUs than the average user, so I'll see more errors (I am one of the few with a AC@H badge ![]() 116 AC@H WUs crunched since 11/13: 9 error (6 of which are the exit code 95/line 296 of wrf_io.f error) 96 valid 1 in progress 9 invalid 1 inconclusive So, this error is only 5% of my WUs, which is why many folks wouldn't see it - if they don't get many of these in the first place, it's not likely it will fall into the 5%. Not the hugest issue and it doesn't bother me - I'm just reporting the WUs as they come along ![]() |
||
|
|
![]() |