Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 76
|
![]() |
Author |
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
@martin64: "This would require changes in the client - definitely more effort and more complicated than just changing the distribution process on the server side..." I question your use of the word "definitely" without knowing more, especially about the server side. I have seen a flow diagram of this somewhere some time, and it has makers, breakers, shakers, squashers, despatchers, validators, hoppers, etc, etc. I don't know whether it runs as 1 or many processes on 1 or more machines, but changing it may not be simple. The science applications are 1 process, and WCG has source code and compiles them all. Let's go with the BOINC benchmark results which can be read from client_state.xml I think.
----------------------------------------// WU startup code: WU_fpops = read from WU file(s) p_fpops = read from client_state.xml p_iops = read from client_state.xml device_benchmark = (p_fpops + p_iops)/2; // (say) cutoff_time = WU_fpops / device_benchmark * units_scaling; // Replace the current instance(s) of "6 hours" with cutoff_time. [Edit]: knreed replied while I was editing this reply ... [Resolved] [Edit]: @MovieMan - Until Kevin said they are now matching devices, I was going to warn you guys at XS that 85% of your longer HCMD2 WUs could get junked if they happened to come up against a wingman with a slow machine like a P4 with HT. [Edit 3 times, last edit by Rickjb at Oct 29, 2009 12:04:39 PM] |
||
|
mreuter80
Advanced Cruncher Joined: Oct 2, 2006 Post Count: 83 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
We will get the vast majority of the benefit though from the matching that is now taking place. So, it is implemented already? ... cool ![]() |
||
|
martin64
Senior Cruncher Germany Joined: May 11, 2009 Post Count: 445 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I question your use of the word "definitely" without knowing more, especially about the server side. What more do I have to know about the server side, other than knreed having implemented the pairing ![]() ![]() I think it's great that something has been done, now let's see what the outcome is. Thanks, knreed! ![]() Regards, Martin ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
What more do I have to know about the server side, other than knreed having implemented the pairing perhaps we should look at not how much we don't really know about the client and server code but celebrate how really freaking smart knreed is if this is what he does when he needs a break ![]() ![]() ![]() [Edit 1 times, last edit by Former Member at Oct 29, 2009 7:31:05 PM] |
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
What more do I have to know about the server side, other than knreed having implemented the pairing perhaps we should look at not how much we don't really know about the client and server code but celebrate how really freaking ![]() ![]() ![]() I appreciate the thanks, but credit is really due to David Anderson. We discussed this last week at the BOINC Workshop and my proposed implementation was significantly more complex. He was able to cut through the noise and propose a much easier implementation. That made it doable in my 'break'. Also - although this mechanism is in effect for all projects except for HPF2 and Rice, it really only plays a significant role for projects that use redundancy. In other words, only HCC and HCMD2 will really see any changes due to this policy. [Edit 1 times, last edit by knreed at Oct 30, 2009 3:08:31 PM] |
||
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
@knreed: Until your post on 30/10 14:07 UTC ("I appreciate the thanks, but credit is really due to David Anderson. ..."), I did not realise that device speed matching was being applied to projects other than HCMD2.
----------------------------------------One effect could be that overclaimers will tend to be matched with other overclaimers and vice versa, reducing the levelling effects of claim-averaging. [Edit]: Statistically, look for correlation between credits awarded per hr vs cruncher speed, and whether it is worse than before. Many of the machines running 64-bit BOINC clients, which overclaim, will also be faster machines, and they will tend to be matched together. I hope that you are using separate historical performance parameters for each project, that are based on actual performance and not BOINC benchmarks, since for some machines, performance relative to the fleet depends very much on the project. For example, my Athlon 64 X2 4200+ (previously) overclaimed by about 15-20% on FAAH, but underclaimed by about 10% on HCMD2. Tracing with perfmonitor showed cache success rate dropping to about 87% on FAAH, but very close to 100% on HCMD2 and during BOINC benchmarks. With their big caches, my Intel Yorkfield quads do not suffer this problem with FAAH, so they underclaim on FAAH and make claims that are closer to average on HCMD2. So how are we going? Here are some of my few valid quorum-of-2 results from WUs sent out after the change. My project mixes are limited. Format is: Project - my hours / wingman's hours [ ; <repeat> ] Where an HCMD2 device hit the 6h limit, I also put: ( my credits awarded / wingman's credits awarded ). 3 of Intel C2Q (3MB cache/CPU core): HCMD2 2.63 / 3.52; HCC 2.53 / 3.87; HCC 2.57 / 3.89; HCC 2.55 / 3.30; FAAH 3.08 / 4.40; HCMD2 6.91 / 6.01 ( 194.0 / 99.1 ) ![]() ; HCMD2 9.87 / 6.00 ( 255.6 / 124.1 ) ![]() ![]() Athlon 64 x2 (512kb cache/CPU core): HCMD2 3.44 / 3.21; HCMD2 3.12 / 3.01; HCMD2 6.01 / 6.01 (94.3 / 100.3); HCMD2 6.01 / 6.01 ( 98.2 / 98.5 ) ![]() CPU times seem to be well-matched for the A64, but not for the Intels. 2 of the Intels are much faster (~4GHz) than the fleet average, and it could be that the slower wingmen assigned to them were the fastest candidates wanting work at the time. Previously, in HCMD2 6h cutoffs, the A64 usually got more credits than the wingman, but it could be a small amount of the wingman's time that gets wasted now. Knreed's fix is working well for the AMD. The lucky fact that HCMD2 is the one project where it lives up to its benchmarks is probably helping. [Edit]: "My jury" is still out on the Yorkfields and probably, other high-end non-HT Intels. It's the single-split WUs, where 1 cruncher quits at 6h but the other continues, that caused the biggest wastage before, and they will continue to do so when they occur. See result in red, above. The better the speed-matching, the less likely these will be. The matching is good for my AMD, but not for the Intels. As for the WUs getting shorter as we progress, I haven't seen this yet. Virgin parents seem to come in batches, and apart from a few of these today (1 Nov), all I've crunched for about 1 week have been kids, grand-kids (lots), and gt-g-kids (2). The average WU length decreases with each generation, so there may have been fewer long WUs than usual. Stork Reed might bring the parents into the world, but for every pair of kid WUs & their offspring, there was a mother WU that experienced an early termination, (and a few had fathers that got "eaten"). In other words, there must have been lots of 6h terminations a little while ago to have made all these child WUs. [Edit 7 times, last edit by Rickjb at Nov 2, 2009 9:35:23 AM] |
||
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
That seems to work pretty well indeed.
----------------------------------------I have just checked 15 HCMD2 child WUs downloaded to my quad about yesterday noon (UTC) and - there is no Pending Validation - runtimes are very close inside each quorum (no cut off for those child WUs) - crediting discrepancies seem to be much reduced. It looks like it has been a very good move, Kevin. ![]() Well done! Jean. Edit: Added to my quad ---------------------------------------- [Edit 1 times, last edit by JmBoullier at Oct 31, 2009 11:22:01 AM] |
||
|
Mysteron347
Senior Cruncher Australia Joined: Apr 28, 2007 Post Count: 179 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Rickjb: I'd interpret your results differently.
Since the run time of your machine AND of your partner's machine were BOTH less than 6 hours for all instances bar one, then the task assigned was run to completion. Each Processed ALL of the remaining structures from the set., hence NO work whatever was lost. In the single case where BOTH ran for 6 hours, there appears to be a marginal difference represented by the differential points (claimed, I believe - awarded should be the same for both.) It is this MARGINAL differential where the work-leak is occurring. As I indicated, I have seen what appears to be a 4:1 ratio in the past - but the matching exercise seems to have tightened that up. I'd suggest that unit was a parent - the exact number of structures actually processed doesn't appear to be available to mere mortals, whereas if it is a later-generation AND runs to completion then the number of structures processed can be derived from the numbering system. |
||
|
martin64
Senior Cruncher Germany Joined: May 11, 2009 Post Count: 445 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
In the single case where BOTH ran for 6 hours, there appears to be a marginal difference represented by the differential points (claimed, I believe - awarded should be the same for both.) It's the difference in the AWARDED points that counts. Only if the WU is cut off, there is a difference in the amount of crunching done, resulting in different awarded points. If the amount of work done is the same for both, the number of awarded points will be the same, too. Regards, Martin ![]() |
||
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Single-split WUs are Still Happening and Wasting CPU Time
Intel Yorkfield quad @ 3.3 GHz with slow memory - not an xtreme machine: --- CMD2_0147-TPM1A.clustersOccur-1D1J_D.clustersOccur_2_133484_138393_0 | 614 | Valid | 31/10/09 09:04:18 | 1/11/09 07:06:30 | 6.01 | 117.9 / 99.1 CMD2_0147-TPM1A.clustersOccur-1D1J_D.clustersOccur_2_133484_138393_1 | 614 | Valid | 31/10/09 09:01:34 | 1/11/09 05:22:16 | 6.91 | 157.0 / 194.0 | Me - Wastage = 94.9 credits --- CMD2_0148-TPM1A.clustersOccur-1I7X_A.clustersOccur_534_1 | 614 | Valid | 1/11/09 10:51:50 | 2/11/09 02:48:04 | 6.00 | 139.4 / 124.1 CMD2_0148-TPM1A.clustersOccur-1I7X_A.clustersOccur_534_0 | 614 | Valid | 1/11/09 10:51:31 | 2/11/09 08:50:25 | 9.87 | 224.1 / 255.6 | Me, wastage = 131.5 credits --- Another WU that took 9.15h is Pending Validation. For more comments, see my most recent post, above, which includes edits. --- @Mysteron347 and martin64: Yes, you are right, and the New Anderson-Reed System is working well where the devices are well matched, but not all devices are being well matched. I think there's a non-linear relationship between device mismatch and wastage. If both devices cut off at 6h (a "double-split"), the loss is the speed difference multiplied by 6h. If only the slower one cuts off ("single-split"), the wastage is that amount, plus the speed of the faster device multiplied by the extra time that it crunches. These are the bad ones, and the probability of them happening increases as the device mismatch increases. Device matching is working well with my AMD, and it has had no single-splits. Device matching is not working well for my Intel quads, and you can see the results. |
||
|
|
![]() |