Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 3268
|
![]() |
Author |
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12398 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
leloft
Your cache would be managed much better by setting project limits to 1 unit more than each of your app_config.xml settings. You will not run out of any of them that way and you will not have any backlog - just 1 spare for each project to tide you over while the next is downloading. I presume that your 23 cores is to allow for OPN GPU. Mike |
||
|
leloft
Cruncher Joined: Jun 8, 2017 Post Count: 23 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Your cache would be managed much better by setting project limits to 1 unit more than each of your app_config.xml settings. You will not run out of any of them that way and you will not have any backlog - just 1 spare for each project to tide you over while the next is downloading. Thanks for the feedback. The use of profiles and app_config alone has not helped with the backlog. The overloaded cache came about from a huge discrepancy between estimated and actual times: The cache was loaded with 48 ARP units with estimated times of between 13 and 26h but was taking >70, the 48 OPN units were estimated at between 2 and 6 h but took up to 22h each. The 115 units were supposed to be 3 days work, with deadlines of ~6d for each unit. Restricting the units to 12 / 24 in app_confiig has resulted in estimated times decreasing faster than the time remaining to deadline as the units are processed. At 9am today a unit had an est time (54h) equal to the deadline (54h), 6 hours later the est time had been reduced to 12 hours before the deadline, a net gain of 6 hours. This has saved the unit from being aborted: boinc was processing one unit with an est/deadline of 22/40 while the above 54/54 unit was waiting. I have had to sequentially use boinccmd --task to suspend and/or resume on all of the 24 tasks to get the units running (and then I re-read_cc_config and had to do it all over again!). My idea of a priority task doesn't seem to be the same as boinc's. I'd be very interested to know how boinc knows a unit is a high priority one, but if it cannot tell that a unit needs to be started before its est time is equal to the deadline, that's a bug, surely. I presume that your 23 cores is to allow for OPN GPU. Not intentionally. It's got an old quadro (K2000) card with nouveau drivers. (Nvidia drivers have always caused me problems) and I'd like to have a go at GPU crunching just to see if the reality matches the hype. But as I cannot afford to lose graphics capability, I haven't plucked up the courage to risk it. The 12/6/5 in app_config was so that the machine had 24 cores available to process 23 units in the hope that it might speed things up a bit. Many thanks |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12398 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The use of the Project Limits was not meant to help with your existing backlog. It was meant to prevent it from recurring.
The cache settings are susceptible to fluctuations in crunching times but the Project Limits are not. If you set the Project limits to 1 unit more for each project than you have in app_config.xml then you will only ever have 1 spare for each project, I doubt that you will see any difference in speed by cutting to 23. You then lay yourself open to any possible shortages in a specific project. I would set app_config.xml to 25 as long as you restrict ARP to 12. Mike |
||
|
Dayle Diamond
Senior Cruncher Joined: Jan 31, 2013 Post Count: 452 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
We are trying to return these units as quickly as possible - why are we encouraging keeping any spare units?
If there's a shortage of ARP work because every task is currently being crunched, that's a success. If there's a shortage of ARP work because the tasks are sitting around in queues, not crunching, that's a setback. |
||
|
maeax
Advanced Cruncher Joined: May 2, 2007 Post Count: 142 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
We are trying to return these units as quickly as possible - why are we encouraging keeping any spare units? If there's a shortage of ARP work because every task is currently being crunched, that's a success. If there's a shortage of ARP work because the tasks are sitting around in queues, not crunching, that's a setback. Yes, app_config and a own definition of a high number of ARP-work in wait status is not the best solution. Boinc have no problem, to get you work for 0.5 days as default, mixed from all WCG-Projects.
AMD Ryzen Threadripper PRO 3995WX 64-Cores/ AMD Radeon (TM) Pro W6600. OS Win11pro
|
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12398 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Dayle
----------------------------------------1 spare on a multicore machine is not a queue. It just tides you over from when one finishes to when the next is downloaded. However, sometimes it takes a bit longer to get one, so the spare keeps you crunching fully. Having a fifth unit on an eight thread machine which crunches 4 ARP at a time means that the spare only has about 6 hours to wait. Larger machines still only need 1 spare and the wait time for the spare on a 24 thread machine would be down to about 20 minutes. It is the much larger queues that are the problem. Mike [Edit 1 times, last edit by Mike.Gibson at Aug 13, 2021 2:24:15 PM] |
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Latest stats:
Average Generation: 82.4 Pace (average time to complete a generation): 4.1 days (7-day average) first_indexed generation num_units_currently_on_generation num_units_completed_last_day |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12398 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thank you, Kevin.
----------------------------------------As 080 is the latest generation labelled 'priority'. I will base this response on that. There have been 33,894 units validated in generations up to and including 080 in the last 3 days, out of 51,226 returned. There are now 44,222 units remaining to be crunched in those generations out of a total of 200,054 up to generation 087 (22%). The stragglers are catching up, but the total is moving up. However, those generation 001 's are still stuck. Mike [Edit 1 times, last edit by Mike.Gibson at Aug 13, 2021 7:21:55 PM] |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2167 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Mike, you posted:
1 spare on a multicore machine is not a queue. It just tides you over from when one finishes to when the next is downloaded. Agreed.Having a fifth unit on an eight thread machine which crunches 4 ARP at a time means that the spare only has about 6 hours to wait. It depends on the duration of tasks. If tasks last 24 hours on average, then you are right. If they last 16 hours, a fifth one would be waiting 4 hours. It's a simple formula: duration per ARP1 task / number of running ARP1 tasks. So, if you have 4 tasks and they last 12 hours on average, then the fifth one would have to wait only 12 / 4 = 3 hours.Larger machines still only need 1 spare and the wait time for the spare on a 24 thread machine would be down to about 20 minutes. Your assumption was that each task would be running for 24 hours (four running tasks, a six hour wait for the fifth one), so the thirteenth one (the spare one on a 24 thread machine) would be waiting 24 hours (duration per ARP1 task) / 12 (number of running ARP1 tasks) = 2 hours. I don't know how you can end up with a wait time of only 20 minutes on a 24 thread machine. ![]() ![]() |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12398 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I did not assume that all units would take 24 hours or say that. I suggested that an 8-thread machine would take about 24 hours but assumed that a 24 thread machine would be quicker - more like 8 hours. Howver I should have said 40 minutes rather than 20. I accidentally divided by 24 instead of 12.
These times are based on comments made in these forums. Mike |
||
|
|
![]() |