Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 3317
Posts: 3317   Pages: 332   [ Previous Page | 115 116 117 118 119 120 121 122 123 124 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3308985 times and has 3316 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Work Available

OK. All 200 cores have been assigned to ARP1. However, I completely forgot about the existing error in the BOINC client (7.16.11) where it ignores the work queue setting at times. Looked at it this morning and had 990 WUs assigned to the machines. I set them to no new tasks for now and will have to babysit these things this week. Will get through most of them but there may be a few (< 1%) that miss the deadline.
[Jul 25, 2021 12:42:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12436
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

entity

Don't forget that they do have 8 days now rather than the 7 shown in Boinc Manager. (Or 4.5 rather than 3.5) They are working on the deadline shown in Result Status. Check after 4 days and delete some only if there are more left than they have already crunched.

Mike
[Jul 25, 2021 1:37:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
paulch2
Cruncher
Joined: Aug 6, 2020
Post Count: 25
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

I guess it takes a while for new machines to be trusted to run stragglers.
Those ones I added back on the 14th June have only just started seeing earlier gens, while older, and slower, machines I'm running have been getting them quite often.
[Jul 25, 2021 8:57:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Stiwi
Advanced Cruncher
Joined: May 19, 2012
Post Count: 75
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

OK. All 200 cores have been assigned to ARP1. However, I completely forgot about the existing error in the BOINC client (7.16.11) where it ignores the work queue setting at times. Looked at it this morning and had 990 WUs assigned to the machines. I set them to no new tasks for now and will have to babysit these things this week. Will get through most of them but there may be a few (< 1%) that miss the deadline.


If you are running ARP on all cores you will probably don't have enough L3 Cache which will increase your runtime. If i remember correctly on my 3900x the runtime doubled if i run ARP on 24 Threads. 12 seems fine.
[Jul 25, 2021 9:44:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Work Available

You are correct, the runtime does double but the amount of work done per day is essentially the same. The EPYC server seems to handle 128 simultaneously reasonably well. On average 24 to 26 hours per WU. If I ran 64, they would probably run in 12 to 15 hours. Same number of WUs per day though. The biggest impact on that server is the time spent handling hardware interrupts. The machines with the consumer grade chips are a little different. The L3 cache conflict is considerably more noticeable on those machines.
[Jul 25, 2021 11:25:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12436
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

What is better is to use half the threads on ARP and the rest on other projects. That achieves the highest throughput.

OPN, MCM & HST are worthwhile projects.

Mike
[Jul 26, 2021 1:10:53 AM]   Link   Report threatening or abusive post: please login first  Go to top 
phytell
Cruncher
Joined: Sep 8, 2014
Post Count: 37
Status: Offline
Reply to this Post  Reply with Quote 
Re: Work Available

@Entity:
I've been wondering how well one of those 128 thread monsters would handle ARP - thanks for posting your runtimes!
If you don't mind sharing, what are you using as storage (I can only imagine that many units would slaughter SSDs)?
[Jul 26, 2021 12:54:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Work Available

entity

That might be so at the moment, but they are on 8-day deadlines whereas the earlier ones are on 4.5-day deadlines so should reappear more frequently. The earlier ones also have a higher priority so should be turned around faster.

There are 5302 pre-077 out there compared with 30307 077-081 but they have many more generations to get through (35227 compared with 86167) which redresses the balance to some extent.

A generation is 35609, but a generation can be completed in 4 days at 18000 results per day. The earliest stragglers have 80 generations to get through so are likely to take 3 months even if they get through 1 generation per day. That is most unlikely as both copies have to validate before moving on.

Your extra machines will help the project to finish sooner, but might cause a local heatwave!

Mike

Mike,
I'm reluctant to completely buy into your scenario. It may be true if everything was taking the max amount of time to complete (8 and 4.5 days). However, I'm getting copious amounts of current generation WUs and turning them around in less than a day. If a machine gets a high priority WU and sits on it for 4 days, I have already turned around 800 of the current generations by the time that one unit comes back. I don't know how Kevin has the server configured for ARP1 priority work. In other words, what is the definition of a reliable machine for ARP1? Question for Kevin: Would it be worth while to redefine reliable machines for ARP1 to those that return work in less than 2 days? At any given time there probably aren't that many high priority work WUs available since they have to run consecutively. I can make all 208 cores available for only priority work if that helps the backlog and they all will come back in less that 2 days. Another question for Kevin: What is the average return time for the high priority work? Is it considerably less that the 4.5 days?
[Jul 26, 2021 1:16:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1323
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Received: ARP1_0002240_081_1 26 Jul 14:40:43 UTC
[Jul 26, 2021 3:01:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12436
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

entity

If I express it differently it might be clearer.

Firstly, your 'new' machines may have to establish their reliability - I don't know how many they will have to return to do that, but it will be several so a few days.

Secondly there were only 5302 pre 077 whereas a full generation is 35609, so there are many more new units by comparison with the stragglers. However when a new generation unit has been crunched, it has to wait for the next generation to be created whereas the stragglers are automatically moved on to the next generation when they have been crunched.

These factors mean that your 'new' machines should start to get an increasing number of stragglers. My current machine is mostly getting 077 & 078 because they are over half the available units.

Mike
[Jul 26, 2021 7:47:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 3317   Pages: 332   [ Previous Page | 115 116 117 118 119 120 121 122 123 124 | Next Page ]
[ Jump to Last Post ]
Post new Thread