Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 164
|
![]() |
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Yeah, ARP1 really should have an active core cap. We're here to progress science, not slow it down, which is the consequence... a task sitting in queue for at least 30 hours... a task holding up the next step 30 hours. I feel this. I've got plenty of Zen+/Zen2 cores over here, ready to crank on this problem. I learned my lesson about oversubscribing machines with MIP1 (what up, cache thrashing), and I was part of the ARP beta, so I feel pretty comfortable with my settings and I know how long these WUs will run on my hardware. It's frustrating to be excited about a new, super important project, and then discover that you're not going to be able to help very much because all the WUs went to people who happened to be awake before you were on day 1. I agree 100% All the hype associated with this project and then it is so restricted one can barely participate. It looks like another HSTB project. The memory requirements aren't that much different than FAHB or MIP1. The client network bandwidth information is available to the server so why not send more work to the clients that have the network bandwidth to handle it? I was going to dedicate 64 EPYC cores and 128GB of memory to this project but it doesn't look like I will be able to get more than about 5 work units. Don't have this type of restriction on ClimatePrediction and their work units use same amount or greater memory and have transfer files that are almost 100MB in size. What's the difference? If the file transfers cause issues for members, let the members restrict the number of WUs to fit their situation instead of a blanket restriction that penalizes everybody. Don't remember a restriction during beta testing.... |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
WCG needs to fix that if they want this project to run quickly. Server abort should occur if ARP1 WU cannot be returned in 48 hours. A task started is a task left alone to finish even if it's overdue. If there were some kind of trickle signal back to the server, at least the server would not send out a wasted extra copy. At some point in time the client was smart enough that 'will not finish in time' and the task being cancelled even before deadline, but there's just multiple capabilities that WCG has not chosen to employ. I'm surprised the 'trickle' method was not set up for this like at CPDN. Every checkpoint is uploaded. If it then crashes, another client can just pick up and finish the remaining steps, and the one who did the initial steps gets credit for their piece of time. A trickle then serves to let the project know 'it's alive and being worked on'. Maybe the limited scale did not justify that effort, just the raw blunt 'easy to maintain' model. Crash at 99%, bye bye. |
||
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The client network bandwidth information is available to the server so why not send more work to the clients that have the network bandwidth to handle it? Transferring small files gives very low measured bandwidth in BOINC, meaning the actual bandwidth can be 10x - 100x the bandwidth reported by BOINC. Basing distribution of work on such unreliable measurements is not a good idea. ![]() "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
||
|
vaughan-AMD
Cruncher Australia Joined: Nov 19, 2004 Post Count: 25 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Aurum420: How do you get such a cheap electric bill? Mine is over $5000 a quarter.
----------------------------------------[Edit 1 times, last edit by vaughan-AMD at Nov 1, 2019 1:15:36 AM] |
||
|
Aurum
Master Cruncher The Great Basin Joined: Dec 24, 2017 Post Count: 2386 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
vaughan-AMD, Our electric rate is $0.07/kWh. Lucky for me my wife never sees the utility bill.
----------------------------------------![]() ![]() |
||
|
l_mckeon
Senior Cruncher Joined: Oct 20, 2007 Post Count: 439 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
How much RAM is a single task actually consuming?
Is this another project that runs at vastly different speeds on Windows and Linux? |
||
|
hchc
Veteran Cruncher USA Joined: Aug 15, 2006 Post Count: 802 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'm getting about this much per task:
----------------------------------------~800 MB RAM consumption ~1-1.5 GB disk space ~85-90 MB upload And yeah, much faster on Linux than Windows for me. My Ivy Bridge (3rd Gen) outperforms my Coffee Lake (8th Gen).
[Edit 1 times, last edit by hchc at Nov 1, 2019 8:28:48 AM] |
||
|
hchc
Veteran Cruncher USA Joined: Aug 15, 2006 Post Count: 802 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
uplinger said:
----------------------------------------Also, we do get machines that are set to run results from other projects but those hosts have already been limited to 1 result per day... They start with say 5 results per day, but once they return errors, it drops down to 1. What I meant was since this project is opt-in, that means that the owner of all those enterprise devices deliberately opted into the project knowing that their fleet of devices on 7.2.47 (from 2014) would error out.
[Edit 2 times, last edit by hchc at Nov 1, 2019 7:57:58 AM] |
||
|
catchercradle
Advanced Cruncher Joined: Jan 16, 2009 Post Count: 128 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No great surprise there for those of us who crunch for CPDN. GPUs strength is in doing lots of computations simultaneously. The nature of weather tasks is that each computation is based on the results of the previous one so little if anything to be gained by running on GPU. The same logic means they wouldn't scale well as multi-core tasks.
|
||
|
Eric_Kaiser
Veteran Cruncher Germany (Hessen) Joined: May 7, 2013 Post Count: 1047 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
WCG needs to fix that if they want this project to run quickly. Server abort should occur if ARP1 WU cannot be returned in 48 hours. I don't second that. I have ARP1 that are now running 44 hours and still not finished. If the server aborts them energy and ressources are wasted. In the end I am not going to support projects that waste ressources. ![]() |
||
|
|
![]() |