Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 76
Posts: 76   Pages: 8   [ Previous Page | 1 2 3 4 5 6 7 8 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 531227 times and has 75 replies Next Thread
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Parents, children, grandchildren WUs - how does it work?

@martin64: "This would require changes in the client - definitely more effort and more complicated than just changing the distribution process on the server side..." I question your use of the word "definitely" without knowing more, especially about the server side. I have seen a flow diagram of this somewhere some time, and it has makers, breakers, shakers, squashers, despatchers, validators, hoppers, etc, etc. I don't know whether it runs as 1 or many processes on 1 or more machines, but changing it may not be simple. The science applications are 1 process, and WCG has source code and compiles them all. Let's go with the BOINC benchmark results which can be read from client_state.xml I think.
// WU startup code:
WU_fpops = read from WU file(s)
p_fpops = read from client_state.xml
p_iops = read from client_state.xml
device_benchmark = (p_fpops + p_iops)/2; // (say)
cutoff_time = WU_fpops / device_benchmark * units_scaling;
// Replace the current instance(s) of "6 hours" with cutoff_time.

[Edit]: knreed replied while I was editing this reply ... [Resolved]
[Edit]: @MovieMan - Until Kevin said they are now matching devices, I was going to warn you guys at XS that 85% of your longer HCMD2 WUs could get junked if they happened to come up against a wingman with a slow machine like a P4 with HT.
----------------------------------------
[Edit 3 times, last edit by Rickjb at Oct 29, 2009 12:04:39 PM]
[Oct 29, 2009 11:12:57 AM]   Link   Report threatening or abusive post: please login first  Go to top 
mreuter80
Advanced Cruncher
Joined: Oct 2, 2006
Post Count: 83
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Parents, children, grandchildren WUs - how does it work?

We will get the vast majority of the benefit though from the matching that is now taking place.


So, it is implemented already? ... cool smile
[Oct 29, 2009 11:24:47 AM]   Link   Report threatening or abusive post: please login first  Go to top 
martin64
Senior Cruncher
Germany
Joined: May 11, 2009
Post Count: 445
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Parents, children, grandchildren WUs - how does it work?

I question your use of the word "definitely" without knowing more, especially about the server side.

What more do I have to know about the server side, other than knreed having implemented the pairing coffee "during a break"? batting eyelashes

I think it's great that something has been done, now let's see what the outcome is. Thanks, knreed! applause

Regards,
Martin
----------------------------------------

[Oct 29, 2009 3:20:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Parents, children, grandchildren WUs - how does it work?

What more do I have to know about the server side, other than knreed having implemented the pairing coffee "during a break"? batting eyelashes
perhaps we should look at not how much we don't really know about the client and server code but celebrate how really freaking smart knreed is if this is what he does when he needs a break nerd
----------------------------------------
[Edit 1 times, last edit by Former Member at Oct 29, 2009 7:31:05 PM]
[Oct 29, 2009 7:30:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Parents, children, grandchildren WUs - how does it work?

What more do I have to know about the server side, other than knreed having implemented the pairing coffee "during a break"? batting eyelashes
perhaps we should look at not how much we don't really know about the client and server code but celebrate how really freaking smart geeky knreed is if this is what he does when he needs a break nerd


I appreciate the thanks, but credit is really due to David Anderson. We discussed this last week at the BOINC Workshop and my proposed implementation was significantly more complex. He was able to cut through the noise and propose a much easier implementation. That made it doable in my 'break'.

Also - although this mechanism is in effect for all projects except for HPF2 and Rice, it really only plays a significant role for projects that use redundancy. In other words, only HCC and HCMD2 will really see any changes due to this policy.
----------------------------------------
[Edit 1 times, last edit by knreed at Oct 30, 2009 3:08:31 PM]
[Oct 30, 2009 3:07:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Parents, children, grandchildren WUs - how does it work?

@knreed: Until your post on 30/10 14:07 UTC ("I appreciate the thanks, but credit is really due to David Anderson. ..."), I did not realise that device speed matching was being applied to projects other than HCMD2.
One effect could be that overclaimers will tend to be matched with other overclaimers and vice versa, reducing the levelling effects of claim-averaging.
[Edit]: Statistically, look for correlation between credits awarded per hr vs cruncher speed, and whether it is worse than before.
Many of the machines running 64-bit BOINC clients, which overclaim, will also be faster machines, and they will tend to be matched together.

I hope that you are using separate historical performance parameters for each project, that are based on actual performance and not BOINC benchmarks, since for some machines, performance relative to the fleet depends very much on the project.
For example, my Athlon 64 X2 4200+ (previously) overclaimed by about 15-20% on FAAH, but underclaimed by about 10% on HCMD2. Tracing with perfmonitor showed cache success rate dropping to about 87% on FAAH, but very close to 100% on HCMD2 and during BOINC benchmarks. With their big caches, my Intel Yorkfield quads do not suffer this problem with FAAH, so they underclaim on FAAH and make claims that are closer to average on HCMD2.

So how are we going? Here are some of my few valid quorum-of-2 results from WUs sent out after the change. My project mixes are limited.
Format is: Project - my hours / wingman's hours [ ; <repeat> ]
Where an HCMD2 device hit the 6h limit, I also put: ( my credits awarded / wingman's credits awarded ).
3 of Intel C2Q (3MB cache/CPU core):
HCMD2 2.63 / 3.52; HCC 2.53 / 3.87; HCC 2.57 / 3.89; HCC 2.55 / 3.30; FAAH 3.08 / 4.40; HCMD2 6.91 / 6.01 ( 194.0 / 99.1 )
; HCMD2 9.87 / 6.00 ( 255.6 / 124.1 )
Athlon 64 x2 (512kb cache/CPU core):
HCMD2 3.44 / 3.21; HCMD2 3.12 / 3.01; HCMD2 6.01 / 6.01 (94.3 / 100.3); HCMD2 6.01 / 6.01 ( 98.2 / 98.5 ) ; HCMD2 6.03 / 6.03 ( 92.5 / 101.9 )

CPU times seem to be well-matched for the A64, but not for the Intels.
2 of the Intels are much faster (~4GHz) than the fleet average, and it could be that the slower wingmen assigned to them were the fastest candidates wanting work at the time.
Previously, in HCMD2 6h cutoffs, the A64 usually got more credits than the wingman, but it could be a small amount of the wingman's time that gets wasted now. Knreed's fix is working well for the AMD. The lucky fact that HCMD2 is the one project where it lives up to its benchmarks is probably helping.
[Edit]: "My jury" is still out on the Yorkfields and probably, other high-end non-HT Intels.
It's the single-split WUs, where 1 cruncher quits at 6h but the other continues, that caused the biggest wastage before, and they will continue to do so when they occur. See result in red, above. The better the speed-matching, the less likely these will be. The matching is good for my AMD, but not for the Intels.
As for the WUs getting shorter as we progress, I haven't seen this yet. Virgin parents seem to come in batches, and apart from a few of these today (1 Nov), all I've crunched for about 1 week have been kids, grand-kids (lots), and gt-g-kids (2). The average WU length decreases with each generation, so there may have been fewer long WUs than usual. Stork Reed might bring the parents into the world, but for every pair of kid WUs & their offspring, there was a mother WU that experienced an early termination, (and a few had fathers that got "eaten"). In other words, there must have been lots of 6h terminations a little while ago to have made all these child WUs.
----------------------------------------
[Edit 7 times, last edit by Rickjb at Nov 2, 2009 9:35:23 AM]
[Oct 31, 2009 10:07:21 AM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3715
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Device matching

That seems to work pretty well indeed.
I have just checked 15 HCMD2 child WUs downloaded to my quad about yesterday noon (UTC) and
- there is no Pending Validation
- runtimes are very close inside each quorum (no cut off for those child WUs)
- crediting discrepancies seem to be much reduced.

It looks like it has been a very good move, Kevin. smile
Well done! Jean.

Edit: Added to my quad
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
----------------------------------------
[Edit 1 times, last edit by JmBoullier at Oct 31, 2009 11:22:01 AM]
[Oct 31, 2009 11:19:22 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Mysteron347
Senior Cruncher
Australia
Joined: Apr 28, 2007
Post Count: 179
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Device matching

Rickjb: I'd interpret your results differently.

Since the run time of your machine AND of your partner's machine were BOTH less than 6 hours for all instances bar one, then the task assigned was run to completion. Each Processed ALL of the remaining structures from the set., hence NO work whatever was lost.

In the single case where BOTH ran for 6 hours, there appears to be a marginal difference represented by the differential points (claimed, I believe - awarded should be the same for both.) It is this MARGINAL differential where the work-leak is occurring. As I indicated, I have seen what appears to be a 4:1 ratio in the past - but the matching exercise seems to have tightened that up. I'd suggest that unit was a parent - the exact number of structures actually processed doesn't appear to be available to mere mortals, whereas if it is a later-generation AND runs to completion then the number of structures processed can be derived from the numbering system.
[Oct 31, 2009 4:25:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
martin64
Senior Cruncher
Germany
Joined: May 11, 2009
Post Count: 445
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Device matching

In the single case where BOTH ran for 6 hours, there appears to be a marginal difference represented by the differential points (claimed, I believe - awarded should be the same for both.)

It's the difference in the AWARDED points that counts. Only if the WU is cut off, there is a difference in the amount of crunching done, resulting in different awarded points. If the amount of work done is the same for both, the number of awarded points will be the same, too.

Regards,
Martin
----------------------------------------

[Oct 31, 2009 8:26:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Parents, children, grandchildren WUs - how does it work?

Single-split WUs are Still Happening and Wasting CPU Time
Intel Yorkfield quad @ 3.3 GHz with slow memory - not an xtreme machine:
---
CMD2_0147-TPM1A.clustersOccur-1D1J_D.clustersOccur_2_133484_138393_0 | 614 | Valid | 31/10/09 09:04:18 | 1/11/09 07:06:30 | 6.01 | 117.9 / 99.1
CMD2_0147-TPM1A.clustersOccur-1D1J_D.clustersOccur_2_133484_138393_1 | 614 | Valid | 31/10/09 09:01:34 | 1/11/09 05:22:16 | 6.91 | 157.0 / 194.0 | Me - Wastage = 94.9 credits
---
CMD2_0148-TPM1A.clustersOccur-1I7X_A.clustersOccur_534_1 | 614 | Valid | 1/11/09 10:51:50 | 2/11/09 02:48:04 | 6.00 | 139.4 / 124.1
CMD2_0148-TPM1A.clustersOccur-1I7X_A.clustersOccur_534_0 | 614 | Valid | 1/11/09 10:51:31 | 2/11/09 08:50:25 | 9.87 | 224.1 / 255.6 | Me, wastage = 131.5 credits
---
Another WU that took 9.15h is Pending Validation.
For more comments, see my most recent post, above, which includes edits.
---
@Mysteron347 and martin64: Yes, you are right, and the New Anderson-Reed System is working well where the devices are well matched, but not all devices are being well matched.

I think there's a non-linear relationship between device mismatch and wastage. If both devices cut off at 6h (a "double-split"), the loss is the speed difference multiplied by 6h. If only the slower one cuts off ("single-split"), the wastage is that amount, plus the speed of the faster device multiplied by the extra time that it crunches. These are the bad ones, and the probability of them happening increases as the device mismatch increases.
Device matching is working well with my AMD, and it has had no single-splits.
Device matching is not working well for my Intel quads, and you can see the results.
[Nov 2, 2009 10:16:46 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 76   Pages: 8   [ Previous Page | 1 2 3 4 5 6 7 8 | Next Page ]
[ Jump to Last Post ]
Post new Thread