Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 3268
Posts: 3268   Pages: 327   [ Previous Page | 124 125 126 127 128 129 130 131 132 133 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3150983 times and has 3267 replies Next Thread
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12398
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

leloft

Your cache would be managed much better by setting project limits to 1 unit more than each of your app_config.xml settings. You will not run out of any of them that way and you will not have any backlog - just 1 spare for each project to tide you over while the next is downloading.

I presume that your 23 cores is to allow for OPN GPU.

Mike
[Aug 12, 2021 2:28:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
leloft
Cruncher
Joined: Jun 8, 2017
Post Count: 23
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available


Your cache would be managed much better by setting project limits to 1 unit more than each of your app_config.xml settings. You will not run out of any of them that way and you will not have any backlog - just 1 spare for each project to tide you over while the next is downloading.

Thanks for the feedback. The use of profiles and app_config alone has not helped with the backlog. The overloaded cache came about from a huge discrepancy between estimated and actual times: The cache was loaded with 48 ARP units with estimated times of between 13 and 26h but was taking >70, the 48 OPN units were estimated at between 2 and 6 h but took up to 22h each. The 115 units were supposed to be 3 days work, with deadlines of ~6d for each unit.
Restricting the units to 12 / 24 in app_confiig has resulted in estimated times decreasing faster than the time remaining to deadline as the units are processed. At 9am today a unit had an est time (54h) equal to the deadline (54h), 6 hours later the est time had been reduced to 12 hours before the deadline, a net gain of 6 hours. This has saved the unit from being aborted: boinc was processing one unit with an est/deadline of 22/40 while the above 54/54 unit was waiting. I have had to sequentially use boinccmd --task to suspend and/or resume on all of the 24 tasks to get the units running (and then I re-read_cc_config and had to do it all over again!). My idea of a priority task doesn't seem to be the same as boinc's. I'd be very interested to know how boinc knows a unit is a high priority one, but if it cannot tell that a unit needs to be started before its est time is equal to the deadline, that's a bug, surely.

I presume that your 23 cores is to allow for OPN GPU.

Not intentionally. It's got an old quadro (K2000) card with nouveau drivers. (Nvidia drivers have always caused me problems) and I'd like to have a go at GPU crunching just to see if the reality matches the hype. But as I cannot afford to lose graphics capability, I haven't plucked up the courage to risk it. The 12/6/5 in app_config was so that the machine had 24 cores available to process 23 units in the hope that it might speed things up a bit.
Many thanks
[Aug 12, 2021 5:52:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12398
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

The use of the Project Limits was not meant to help with your existing backlog. It was meant to prevent it from recurring.

The cache settings are susceptible to fluctuations in crunching times but the Project Limits are not. If you set the Project limits to 1 unit more for each project than you have in app_config.xml then you will only ever have 1 spare for each project,

I doubt that you will see any difference in speed by cutting to 23. You then lay yourself open to any possible shortages in a specific project. I would set app_config.xml to 25 as long as you restrict ARP to 12.

Mike
[Aug 12, 2021 7:06:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Dayle Diamond
Senior Cruncher
Joined: Jan 31, 2013
Post Count: 452
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

We are trying to return these units as quickly as possible - why are we encouraging keeping any spare units?
If there's a shortage of ARP work because every task is currently being crunched, that's a success. If there's a shortage of ARP work because the tasks are sitting around in queues, not crunching, that's a setback.
[Aug 13, 2021 5:11:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
maeax
Advanced Cruncher
Joined: May 2, 2007
Post Count: 142
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

We are trying to return these units as quickly as possible - why are we encouraging keeping any spare units?
If there's a shortage of ARP work because every task is currently being crunched, that's a success. If there's a shortage of ARP work because the tasks are sitting around in queues, not crunching, that's a setback.

Yes, app_config and a own definition of a high number of ARP-work in wait status is not the best solution.
Boinc have no problem, to get you work for 0.5 days as default, mixed from all WCG-Projects.
----------------------------------------
AMD Ryzen Threadripper PRO 3995WX 64-Cores/ AMD Radeon (TM) Pro W6600. OS Win11pro
[Aug 13, 2021 9:09:15 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12398
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Dayle

1 spare on a multicore machine is not a queue. It just tides you over from when one finishes to when the next is downloaded. However, sometimes it takes a bit longer to get one, so the spare keeps you crunching fully.

Having a fifth unit on an eight thread machine which crunches 4 ARP at a time means that the spare only has about 6 hours to wait. Larger machines still only need 1 spare and the wait time for the spare on a 24 thread machine would be down to about 20 minutes.

It is the much larger queues that are the problem.

Mike
----------------------------------------
[Edit 1 times, last edit by Mike.Gibson at Aug 13, 2021 2:24:15 PM]
[Aug 13, 2021 2:20:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Latest stats:

Average Generation: 82.4
Pace (average time to complete a generation): 4.1 days (7-day average)
first_indexed       generation num_units_currently_on_generation num_units_completed_last_day 
------------------- ---------- --------------------------------- ----------------------------
2019-10-01 22:26:53 000
2019-10-30 18:58:54 001 6
2019-12-08 11:56:25 002
2020-01-12 02:02:34 003
2020-02-08 03:43:00 004
2020-02-24 06:27:42 005
2020-03-09 17:38:25 006
2020-03-17 08:44:19 007
2020-03-23 20:52:24 008
2020-04-01 14:39:46 009
2020-04-12 08:29:32 010
2020-04-21 02:41:36 011
2020-05-02 03:16:28 012
2020-05-10 13:29:40 013
2020-05-22 10:46:51 014
2020-06-02 21:07:48 015
2020-06-20 20:53:08 016
2020-07-01 12:31:12 017
2020-07-09 18:39:23 018
2020-07-18 16:08:31 019
2020-07-26 16:32:08 020
2020-08-08 15:15:22 021
2020-08-19 00:49:10 022
2020-08-24 07:02:09 023
2020-08-30 05:56:33 024
2020-09-04 11:35:58 025
2020-09-09 17:27:07 026
2020-09-15 06:25:11 027
2020-09-20 10:01:14 028
2020-09-25 22:07:49 029
2020-10-02 07:08:22 030
2020-10-07 17:55:57 031
2020-10-14 16:25:19 032
2020-10-18 20:05:40 033
2020-10-25 15:34:22 034
2020-10-31 22:55:26 035
2020-11-04 06:29:28 036
2020-11-12 06:33:47 037
2020-11-17 09:21:26 038
2020-11-24 13:47:28 039
2020-11-30 07:44:02 040 1
2020-12-07 20:20:00 041 2 2
2020-12-13 18:26:56 042 3 1
2020-12-20 00:33:11 043 2
2020-12-25 22:27:11 044 1
2021-01-01 07:57:34 045 1 3
2021-01-07 18:08:33 046 3 2
2021-01-15 02:41:00 047 7 1
2021-01-22 20:25:40 048 5 1
2021-01-28 10:53:04 049 2 5
2021-02-03 14:32:54 050 5 3
2021-02-09 03:20:45 051 5 6
2021-02-16 14:14:47 052 8 3
2021-02-22 01:22:20 053 9 1
2021-02-28 10:29:30 054 1 4
2021-03-06 18:23:14 055 7 6
2021-03-12 10:16:29 056 7 2
2021-03-17 08:30:15 057 6 1
2021-03-23 06:08:46 058 6 6
2021-03-29 22:39:10 059 10 5
2021-04-05 05:01:38 060 9 1
2021-04-10 21:09:07 061 7 3
2021-04-16 23:20:59 062 10 6
2021-04-22 07:50:06 063 11 8
2021-04-28 23:02:38 064 13 6
2021-05-04 04:45:55 065 9 4
2021-05-09 14:11:18 066 8 10
2021-05-16 14:55:41 067 15 8
2021-05-23 15:02:08 068 9 4
2021-05-26 06:43:43 069 11 9
2021-05-29 18:38:55 070 17 10
2021-06-03 15:46:15 071 23 8
2021-06-11 23:13:21 072 15 8
2021-06-15 11:54:58 073 17 8
2021-06-22 00:30:34 074 18 4
2021-06-27 11:56:43 075 12 12
2021-07-02 15:06:05 076 28 10
2021-07-08 20:49:12 077 43 10
2021-07-14 07:30:06 078 107 62
2021-07-18 14:21:26 079 903 304
2021-07-20 23:37:16 080 3312 1446
2021-07-23 21:00:51 081 6701 1612
2021-07-27 02:27:09 082 7183 1866
2021-07-29 02:04:50 083 6633 1607
2021-07-30 14:32:48 084 5009 1116
2021-08-03 02:15:23 085 3098 614
2021-08-05 08:06:31 086 1470 271
2021-08-08 05:34:18 087 607 108
2021-08-11 02:09:16 088 215

[Aug 13, 2021 5:17:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12398
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Thank you, Kevin.

As 080 is the latest generation labelled 'priority'. I will base this response on that.

There have been 33,894 units validated in generations up to and including 080 in the last 3 days, out of 51,226 returned.

There are now 44,222 units remaining to be crunched in those generations out of a total of 200,054 up to generation 087 (22%).

The stragglers are catching up, but the total is moving up.

However, those generation 001 's are still stuck.

Mike
----------------------------------------
[Edit 1 times, last edit by Mike.Gibson at Aug 13, 2021 7:21:55 PM]
[Aug 13, 2021 7:21:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2167
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Mike, you posted:
1 spare on a multicore machine is not a queue. It just tides you over from when one finishes to when the next is downloaded.
Agreed.
Having a fifth unit on an eight thread machine which crunches 4 ARP at a time means that the spare only has about 6 hours to wait.
It depends on the duration of tasks. If tasks last 24 hours on average, then you are right. If they last 16 hours, a fifth one would be waiting 4 hours. It's a simple formula: duration per ARP1 task / number of running ARP1 tasks. So, if you have 4 tasks and they last 12 hours on average, then the fifth one would have to wait only 12 / 4 = 3 hours.
Larger machines still only need 1 spare and the wait time for the spare on a 24 thread machine would be down to about 20 minutes.
Your assumption was that each task would be running for 24 hours (four running tasks, a six hour wait for the fifth one), so the thirteenth one (the spare one on a 24 thread machine) would be waiting 24 hours (duration per ARP1 task) / 12 (number of running ARP1 tasks) = 2 hours. I don't know how you can end up with a wait time of only 20 minutes on a 24 thread machine. devilish It would mean, according to the formula duration per ARP1 task / number of running ARP1 tasks, that the duration per task / 12 = 20 minutes, so a task would only last 4 hours. nerd
[Aug 14, 2021 10:10:23 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12398
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

I did not assume that all units would take 24 hours or say that. I suggested that an 8-thread machine would take about 24 hours but assumed that a 24 thread machine would be quicker - more like 8 hours. Howver I should have said 40 minutes rather than 20. I accidentally divided by 24 instead of 12.

These times are based on comments made in these forums.

Mike
[Aug 14, 2021 6:03:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 3268   Pages: 327   [ Previous Page | 124 125 126 127 128 129 130 131 132 133 | Next Page ]
[ Jump to Last Post ]
Post new Thread