World Community Grid - View Thread

World Community Grid Forums

Category: Active Research

Forum: Africa Rainfall Project

Thread: Work Available

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 3268

[ ]

Author

This topic has been viewed 3150983 times and has 3267 replies

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12398
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

45 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

90 day badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Work Available

leloft

Your cache would be managed much better by setting project limits to 1 unit more than each of your app_config.xml settings. You will not run out of any of them that way and you will not have any backlog - just 1 spare for each project to tide you over while the next is downloading.

I presume that your 23 cores is to allow for OPN GPU.

Mike

[Aug 12, 2021 2:28:15 PM]

leloft
Cruncher
Joined: Jun 8, 2017
Post Count: 23
Status: Offline
Project Badges:

180 day badge for Outsmart Ebola Together

1 year badge for FightAIDS@Home - Phase 2

1 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

1 year badge for OpenPandemics - COVID-19


Re: Work Available

Your cache would be managed much better by setting project limits to 1 unit more than each of your app_config.xml settings. You will not run out of any of them that way and you will not have any backlog - just 1 spare for each project to tide you over while the next is downloading.

Thanks for the feedback. The use of profiles and app_config alone has not helped with the backlog. The overloaded cache came about from a huge discrepancy between estimated and actual times: The cache was loaded with 48 ARP units with estimated times of between 13 and 26h but was taking >70, the 48 OPN units were estimated at between 2 and 6 h but took up to 22h each. The 115 units were supposed to be 3 days work, with deadlines of ~6d for each unit.
Restricting the units to 12 / 24 in app_confiig has resulted in estimated times decreasing faster than the time remaining to deadline as the units are processed. At 9am today a unit had an est time (54h) equal to the deadline (54h), 6 hours later the est time had been reduced to 12 hours before the deadline, a net gain of 6 hours. This has saved the unit from being aborted: boinc was processing one unit with an est/deadline of 22/40 while the above 54/54 unit was waiting. I have had to sequentially use boinccmd --task to suspend and/or resume on all of the 24 tasks to get the units running (and then I re-read_cc_config and had to do it all over again!). My idea of a priority task doesn't seem to be the same as boinc's. I'd be very interested to know how boinc knows a unit is a high priority one, but if it cannot tell that a unit needs to be started before its est time is equal to the deadline, that's a bug, surely.

I presume that your 23 cores is to allow for OPN GPU.

Not intentionally. It's got an old quadro (K2000) card with nouveau drivers. (Nvidia drivers have always caused me problems) and I'd like to have a go at GPU crunching just to see if the reality matches the hype. But as I cannot afford to lose graphics capability, I haven't plucked up the courage to risk it. The 12/6/5 in app_config was so that the machine had 24 cores available to process 23 units in the hope that it might speed things up a bit.
Many thanks

[Aug 12, 2021 5:52:45 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12398
Status: Offline
Project Badges:


Re: Work Available

The use of the Project Limits was not meant to help with your existing backlog. It was meant to prevent it from recurring.

The cache settings are susceptible to fluctuations in crunching times but the Project Limits are not. If you set the Project limits to 1 unit more for each project than you have in app_config.xml then you will only ever have 1 spare for each project,

I doubt that you will see any difference in speed by cutting to 23. You then lay yourself open to any possible shortages in a specific project. I would set app_config.xml to 25 as long as you restrict ARP to 12.

Mike

[Aug 12, 2021 7:06:08 PM]

Dayle Diamond
Senior Cruncher
Joined: Jan 31, 2013
Post Count: 452
Status: Offline
Project Badges:

1 year badge for The Clean Energy Project - Phase 2

14 day badge for Drug Search for Leishmaniasis

100 year badge for Mapping Cancer Markers

20 year badge for Outsmart Ebola Together

20 year badge for FightAIDS@Home - Phase 2

10 year badge for Smash Childhood Cancer

10 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: Work Available

We are trying to return these units as quickly as possible - why are we encouraging keeping any spare units?
If there's a shortage of ARP work because every task is currently being crunched, that's a success. If there's a shortage of ARP work because the tasks are sitting around in queues, not crunching, that's a setback.

[Aug 13, 2021 5:11:05 AM]

maeax
Advanced Cruncher
Joined: May 2, 2007
Post Count: 142
Status: Offline
Project Badges:

90 day badge for Human Proteome Folding - Phase 2

14 day badge for Discovering Dengue Drugs - Together

45 day badge for Nutritious Rice for the World

90 day badge for Help Fight Childhood Cancer

180 day badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

90 day badge for Drug Search for Leishmaniasis

90 day badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

200 year badge for Mapping Cancer Markers

180 day badge for Uncovering Genome Mysteries

2 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

5 year badge for Microbiome Immunity Project

50 year badge for OpenPandemics - COVID-19


Re: Work Available

Yes, app_config and a own definition of a high number of ARP-work in wait status is not the best solution.
Boinc have no problem, to get you work for 0.5 days as default, mixed from all WCG-Projects.

----------------------------------------

AMD Ryzen Threadripper PRO 3995WX 64-Cores/ AMD Radeon (TM) Pro W6600. OS Win11pro

[Aug 13, 2021 9:09:15 AM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12398
Status: Offline
Project Badges:


Re: Work Available

Dayle

1 spare on a multicore machine is not a queue. It just tides you over from when one finishes to when the next is downloaded. However, sometimes it takes a bit longer to get one, so the spare keeps you crunching fully.

Having a fifth unit on an eight thread machine which crunches 4 ARP at a time means that the spare only has about 6 hours to wait. Larger machines still only need 1 spare and the wait time for the spare on a 24 thread machine would be down to about 20 minutes.

It is the much larger queues that are the problem.

Mike

----------------------------------------
[Edit 1 times, last edit by Mike.Gibson at Aug 13, 2021 2:24:15 PM]

[Aug 13, 2021 2:20:12 PM]

knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding

45 day badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Uncovering Genome Mysteries

45 day badge for Outsmart Ebola Together

180 day badge for FightAIDS@Home - Phase 2

1 year badge for Africa Rainfall Project

180 day badge for OpenPandemics - COVID-19


Re: Work Available

Latest stats:

Average Generation: 82.4
Pace (average time to complete a generation): 4.1 days (7-day average)

first_indexed       generation num_units_currently_on_generation num_units_completed_last_day 
------------------- ---------- --------------------------------- ---------------------------- 
2019-10-01 22:26:53 000                                                                       
2019-10-30 18:58:54 001        6                                                              
2019-12-08 11:56:25 002                                                                       
2020-01-12 02:02:34 003                                                                       
2020-02-08 03:43:00 004                                                                       
2020-02-24 06:27:42 005                                                                       
2020-03-09 17:38:25 006                                                                       
2020-03-17 08:44:19 007                                                                       
2020-03-23 20:52:24 008                                                                       
2020-04-01 14:39:46 009                                                                       
2020-04-12 08:29:32 010                                                                       
2020-04-21 02:41:36 011                                                                       
2020-05-02 03:16:28 012                                                                       
2020-05-10 13:29:40 013                                                                       
2020-05-22 10:46:51 014                                                                       
2020-06-02 21:07:48 015                                                                       
2020-06-20 20:53:08 016                                                                       
2020-07-01 12:31:12 017                                                                       
2020-07-09 18:39:23 018                                                                       
2020-07-18 16:08:31 019                                                                       
2020-07-26 16:32:08 020                                                                       
2020-08-08 15:15:22 021                                                                       
2020-08-19 00:49:10 022                                                                       
2020-08-24 07:02:09 023                                                                       
2020-08-30 05:56:33 024                                                                       
2020-09-04 11:35:58 025                                                                       
2020-09-09 17:27:07 026                                                                       
2020-09-15 06:25:11 027                                                                       
2020-09-20 10:01:14 028                                                                       
2020-09-25 22:07:49 029                                                                       
2020-10-02 07:08:22 030                                                                       
2020-10-07 17:55:57 031                                                                       
2020-10-14 16:25:19 032                                                                       
2020-10-18 20:05:40 033                                                                       
2020-10-25 15:34:22 034                                                                       
2020-10-31 22:55:26 035                                                                       
2020-11-04 06:29:28 036                                                                       
2020-11-12 06:33:47 037                                                                       
2020-11-17 09:21:26 038                                                                       
2020-11-24 13:47:28 039                                                                       
2020-11-30 07:44:02 040                                          1                            
2020-12-07 20:20:00 041        2                                 2                            
2020-12-13 18:26:56 042        3                                 1                            
2020-12-20 00:33:11 043        2                                                              
2020-12-25 22:27:11 044        1                                                              
2021-01-01 07:57:34 045        1                                 3                            
2021-01-07 18:08:33 046        3                                 2                            
2021-01-15 02:41:00 047        7                                 1                            
2021-01-22 20:25:40 048        5                                 1                            
2021-01-28 10:53:04 049        2                                 5                            
2021-02-03 14:32:54 050        5                                 3                            
2021-02-09 03:20:45 051        5                                 6                            
2021-02-16 14:14:47 052        8                                 3                            
2021-02-22 01:22:20 053        9                                 1                            
2021-02-28 10:29:30 054        1                                 4                            
2021-03-06 18:23:14 055        7                                 6                            
2021-03-12 10:16:29 056        7                                 2                            
2021-03-17 08:30:15 057        6                                 1                            
2021-03-23 06:08:46 058        6                                 6                            
2021-03-29 22:39:10 059        10                                5                            
2021-04-05 05:01:38 060        9                                 1                            
2021-04-10 21:09:07 061        7                                 3                            
2021-04-16 23:20:59 062        10                                6                            
2021-04-22 07:50:06 063        11                                8                            
2021-04-28 23:02:38 064        13                                6                            
2021-05-04 04:45:55 065        9                                 4                            
2021-05-09 14:11:18 066        8                                 10                           
2021-05-16 14:55:41 067        15                                8                            
2021-05-23 15:02:08 068        9                                 4                            
2021-05-26 06:43:43 069        11                                9                            
2021-05-29 18:38:55 070        17                                10                           
2021-06-03 15:46:15 071        23                                8                            
2021-06-11 23:13:21 072        15                                8                            
2021-06-15 11:54:58 073        17                                8                            
2021-06-22 00:30:34 074        18                                4                            
2021-06-27 11:56:43 075        12                                12                           
2021-07-02 15:06:05 076        28                                10                           
2021-07-08 20:49:12 077        43                                10                           
2021-07-14 07:30:06 078        107                               62                           
2021-07-18 14:21:26 079        903                               304                          
2021-07-20 23:37:16 080        3312                              1446                         
2021-07-23 21:00:51 081        6701                              1612                         
2021-07-27 02:27:09 082        7183                              1866                         
2021-07-29 02:04:50 083        6633                              1607                         
2021-07-30 14:32:48 084        5009                              1116                         
2021-08-03 02:15:23 085        3098                              614                          
2021-08-05 08:06:31 086        1470                              271                          
2021-08-08 05:34:18 087        607                               108                          
2021-08-11 02:09:16 088        215

[Aug 13, 2021 5:17:37 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12398
Status: Offline
Project Badges:


Re: Work Available

Thank you, Kevin.

As 080 is the latest generation labelled 'priority'. I will base this response on that.

There have been 33,894 units validated in generations up to and including 080 in the last 3 days, out of 51,226 returned.

There are now 44,222 units remaining to be crunched in those generations out of a total of 200,054 up to generation 087 (22%).

The stragglers are catching up, but the total is moving up.

However, those generation 001 's are still stuck.

Mike

----------------------------------------
[Edit 1 times, last edit by Mike.Gibson at Aug 13, 2021 7:21:55 PM]

[Aug 13, 2021 7:21:11 PM]

adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2167
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

90 day badge for Nutritious Rice for the World

2 year badge for Help Fight Childhood Cancer

2 year badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for Computing for Clean Water

1 year badge for GO Fight Against Malaria

1 year badge for Uncovering Genome Mysteries

20 year badge for Smash Childhood Cancer


Re: Work Available

Mike, you posted:

1 spare on a multicore machine is not a queue. It just tides you over from when one finishes to when the next is downloaded.

Agreed.

Having a fifth unit on an eight thread machine which crunches 4 ARP at a time means that the spare only has about 6 hours to wait.

It depends on the duration of tasks. If tasks last 24 hours on average, then you are right. If they last 16 hours, a fifth one would be waiting 4 hours. It's a simple formula: duration per ARP1 task / number of running ARP1 tasks. So, if you have 4 tasks and they last 12 hours on average, then the fifth one would have to wait only 12 / 4 = 3 hours.

Larger machines still only need 1 spare and the wait time for the spare on a 24 thread machine would be down to about 20 minutes.

Your assumption was that each task would be running for 24 hours (four running tasks, a six hour wait for the fifth one), so the thirteenth one (the spare one on a 24 thread machine) would be waiting 24 hours (duration per ARP1 task) / 12 (number of running ARP1 tasks) = 2 hours. I don't know how you can end up with a wait time of only 20 minutes on a 24 thread machine. devilish

It would mean, according to the formula duration per ARP1 task / number of running ARP1 tasks, that the duration per task / 12 = 20 minutes, so a task would only last 4 hours. nerd

[Aug 14, 2021 10:10:23 AM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12398
Status: Offline
Project Badges:


Re: Work Available

I did not assume that all units would take 24 hours or say that. I suggested that an 8-thread machine would take about 24 hours but assumed that a 24 thread machine would be quicker - more like 8 hours. Howver I should have said 40 minutes rather than 20. I accidentally divided by 24 instead of 12.

These times are based on comments made in these forums.

Mike

[Aug 14, 2021 6:03:02 PM]

[ ]