World Community Grid - View Thread

World Community Grid Forums

Category: Active Research

Forum: Africa Rainfall Project

Thread: Work Available

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 3317

[ ]

Author

This topic has been viewed 3308985 times and has 3316 replies

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Work Available

OK. All 200 cores have been assigned to ARP1. However, I completely forgot about the existing error in the BOINC client (7.16.11) where it ignores the work queue setting at times. Looked at it this morning and had 990 WUs assigned to the machines. I set them to no new tasks for now and will have to babysit these things this week. Will get through most of them but there may be a few (< 1%) that miss the deadline.

[Jul 25, 2021 12:42:34 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12436
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

45 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

90 day badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Work Available

entity

Don't forget that they do have 8 days now rather than the 7 shown in Boinc Manager. (Or 4.5 rather than 3.5) They are working on the deadline shown in Result Status. Check after 4 days and delete some only if there are more left than they have already crunched.

Mike

[Jul 25, 2021 1:37:50 PM]

paulch2
Cruncher
Joined: Aug 6, 2020
Post Count: 25
Status: Offline
Project Badges:

100 year badge for Mapping Cancer Markers

10 year badge for Microbiome Immunity Project

50 year badge for Africa Rainfall Project

200 year badge for OpenPandemics - COVID-19


Re: Work Available

I guess it takes a while for new machines to be trusted to run stragglers.
Those ones I added back on the 14th June have only just started seeing earlier gens, while older, and slower, machines I'm running have been getting them quite often.

[Jul 25, 2021 8:57:38 PM]

Stiwi
Advanced Cruncher
Joined: May 19, 2012
Post Count: 75
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding - Phase 2

2 year badge for Help Fight Childhood Cancer

2 year badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

180 day badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

1 year badge for Uncovering Genome Mysteries

10 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

2 year badge for Africa Rainfall Project

2 year badge for OpenPandemics - COVID-19


Re: Work Available

If you are running ARP on all cores you will probably don't have enough L3 Cache which will increase your runtime. If i remember correctly on my 3900x the runtime doubled if i run ARP on 24 Threads. 12 seems fine.

[Jul 25, 2021 9:44:22 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Work Available

You are correct, the runtime does double but the amount of work done per day is essentially the same. The EPYC server seems to handle 128 simultaneously reasonably well. On average 24 to 26 hours per WU. If I ran 64, they would probably run in 12 to 15 hours. Same number of WUs per day though. The biggest impact on that server is the time spent handling hardware interrupts. The machines with the consumer grade chips are a little different. The L3 cache conflict is considerably more noticeable on those machines.

[Jul 25, 2021 11:25:11 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12436
Status: Offline
Project Badges:


Re: Work Available

What is better is to use half the threads on ARP and the rest on other projects. That achieves the highest throughput.

OPN, MCM & HST are worthwhile projects.

Mike

[Jul 26, 2021 1:10:53 AM]

phytell
Cruncher
Joined: Sep 8, 2014
Post Count: 37
Status: Offline


Re: Work Available

@Entity:
I've been wondering how well one of those 128 thread monsters would handle ARP - thanks for posting your runtimes!
If you don't mind sharing, what are you using as storage (I can only imagine that many units would slaughter SSDs)?

[Jul 26, 2021 12:54:43 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Work Available

entity

That might be so at the moment, but they are on 8-day deadlines whereas the earlier ones are on 4.5-day deadlines so should reappear more frequently. The earlier ones also have a higher priority so should be turned around faster.

There are 5302 pre-077 out there compared with 30307 077-081 but they have many more generations to get through (35227 compared with 86167) which redresses the balance to some extent.

A generation is 35609, but a generation can be completed in 4 days at 18000 results per day. The earliest stragglers have 80 generations to get through so are likely to take 3 months even if they get through 1 generation per day. That is most unlikely as both copies have to validate before moving on.

Your extra machines will help the project to finish sooner, but might cause a local heatwave!

Mike

Mike,
I'm reluctant to completely buy into your scenario. It may be true if everything was taking the max amount of time to complete (8 and 4.5 days). However, I'm getting copious amounts of current generation WUs and turning them around in less than a day. If a machine gets a high priority WU and sits on it for 4 days, I have already turned around 800 of the current generations by the time that one unit comes back. I don't know how Kevin has the server configured for ARP1 priority work. In other words, what is the definition of a reliable machine for ARP1? Question for Kevin: Would it be worth while to redefine reliable machines for ARP1 to those that return work in less than 2 days? At any given time there probably aren't that many high priority work WUs available since they have to run consecutively. I can make all 208 cores available for only priority work if that helps the backlog and they all will come back in less that 2 days. Another question for Kevin: What is the average return time for the high priority work? Is it considerably less that the 4.5 days?

[Jul 26, 2021 1:16:47 PM]

Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1323
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

90 day badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

90 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

2 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

20 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

5 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: Work Available

Received: ARP1_0002240_081_1 26 Jul 14:40:43 UTC

[Jul 26, 2021 3:01:59 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12436
Status: Offline
Project Badges:


Re: Work Available

entity

If I express it differently it might be clearer.

Firstly, your 'new' machines may have to establish their reliability - I don't know how many they will have to return to do that, but it will be several so a few days.

Secondly there were only 5302 pre 077 whereas a full generation is 35609, so there are many more new units by comparison with the stragglers. However when a new generation unit has been crunched, it has to wait for the next generation to be created whereas the stragglers are automatically moved on to the next generation when they have been crunched.

These factors mean that your 'new' machines should start to get an increasing number of stragglers. My current machine is mostly getting 077 & 078 because they are over half the available units.

Mike

[Jul 26, 2021 7:47:35 PM]

[ ]