Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 164
Posts: 164   Pages: 17   [ Previous Page | 2 3 4 5 6 7 8 9 10 11 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
This topic has been viewed 53227 times and has 163 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: For Current Volunteers: Advance Information on Our Newest Project

Yeah, ARP1 really should have an active core cap. We're here to progress science, not slow it down, which is the consequence... a task sitting in queue for at least 30 hours... a task holding up the next step 30 hours.


I feel this. I've got plenty of Zen+/Zen2 cores over here, ready to crank on this problem. I learned my lesson about oversubscribing machines with MIP1 (what up, cache thrashing), and I was part of the ARP beta, so I feel pretty comfortable with my settings and I know how long these WUs will run on my hardware.

It's frustrating to be excited about a new, super important project, and then discover that you're not going to be able to help very much because all the WUs went to people who happened to be awake before you were on day 1.

I agree 100% All the hype associated with this project and then it is so restricted one can barely participate. It looks like another HSTB project. The memory requirements aren't that much different than FAHB or MIP1. The client network bandwidth information is available to the server so why not send more work to the clients that have the network bandwidth to handle it? I was going to dedicate 64 EPYC cores and 128GB of memory to this project but it doesn't look like I will be able to get more than about 5 work units. Don't have this type of restriction on ClimatePrediction and their work units use same amount or greater memory and have transfer files that are almost 100MB in size. What's the difference? If the file transfers cause issues for members, let the members restrict the number of WUs to fit their situation instead of a blanket restriction that penalizes everybody. Don't remember a restriction during beta testing....
[Oct 31, 2019 5:29:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: For Current Volunteers: Advance Information on Our Newest Project

WCG needs to fix that if they want this project to run quickly. Server abort should occur if ARP1 WU cannot be returned in 48 hours.

A task started is a task left alone to finish even if it's overdue. If there were some kind of trickle signal back to the server, at least the server would not send out a wasted extra copy. At some point in time the client was smart enough that 'will not finish in time' and the task being cancelled even before deadline, but there's just multiple capabilities that WCG has not chosen to employ.

I'm surprised the 'trickle' method was not set up for this like at CPDN. Every checkpoint is uploaded. If it then crashes, another client can just pick up and finish the remaining steps, and the one who did the initial steps gets credit for their piece of time. A trickle then serves to let the project know 'it's alive and being worked on'. Maybe the limited scale did not justify that effort, just the raw blunt 'easy to maintain' model. Crash at 99%, bye bye.
[Oct 31, 2019 7:15:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: For Current Volunteers: Advance Information on Our Newest Project

The client network bandwidth information is available to the server so why not send more work to the clients that have the network bandwidth to handle it?

Transferring small files gives very low measured bandwidth in BOINC, meaning the actual bandwidth can be 10x - 100x the bandwidth reported by BOINC. Basing distribution of work on such unreliable measurements is not a good idea.
----------------------------------------


"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
[Oct 31, 2019 11:43:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
vaughan-AMD
Cruncher
Australia
Joined: Nov 19, 2004
Post Count: 25
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: For Current Volunteers: Advance Information on Our Newest Project

Aurum420: How do you get such a cheap electric bill? Mine is over $5000 a quarter.
----------------------------------------
[Edit 1 times, last edit by vaughan-AMD at Nov 1, 2019 1:15:36 AM]
[Nov 1, 2019 1:14:26 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Aurum
Master Cruncher
The Great Basin
Joined: Dec 24, 2017
Post Count: 2386
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: For Current Volunteers: Advance Information on Our Newest Project

vaughan-AMD, Our electric rate is $0.07/kWh. Lucky for me my wife never sees the utility bill.
----------------------------------------

...KRI please cancel all shadow-banning
[Nov 1, 2019 1:19:39 AM]   Link   Report threatening or abusive post: please login first  Go to top 
l_mckeon
Senior Cruncher
Joined: Oct 20, 2007
Post Count: 439
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: For Current Volunteers: Advance Information on Our Newest Project

How much RAM is a single task actually consuming?

Is this another project that runs at vastly different speeds on Windows and Linux?
[Nov 1, 2019 1:29:54 AM]   Link   Report threatening or abusive post: please login first  Go to top 
hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 802
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: For Current Volunteers: Advance Information on Our Newest Project

I'm getting about this much per task:

~800 MB RAM consumption
~1-1.5 GB disk space
~85-90 MB upload

And yeah, much faster on Linux than Windows for me. My Ivy Bridge (3rd Gen) outperforms my Coffee Lake (8th Gen).
----------------------------------------
  • i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

----------------------------------------
[Edit 1 times, last edit by hchc at Nov 1, 2019 8:28:48 AM]
[Nov 1, 2019 7:11:09 AM]   Link   Report threatening or abusive post: please login first  Go to top 
hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 802
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: For Current Volunteers: Advance Information on Our Newest Project

uplinger said:
Also, we do get machines that are set to run results from other projects but those hosts have already been limited to 1 result per day... They start with say 5 results per day, but once they return errors, it drops down to 1.

What I meant was since this project is opt-in, that means that the owner of all those enterprise devices deliberately opted into the project knowing that their fleet of devices on 7.2.47 (from 2014) would error out.
----------------------------------------
  • i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

----------------------------------------
[Edit 2 times, last edit by hchc at Nov 1, 2019 7:57:58 AM]
[Nov 1, 2019 7:39:55 AM]   Link   Report threatening or abusive post: please login first  Go to top 
catchercradle
Advanced Cruncher
Joined: Jan 16, 2009
Post Count: 128
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: For Current Volunteers: Advance Information on Our Newest Project

No great surprise there for those of us who crunch for CPDN. GPUs strength is in doing lots of computations simultaneously. The nature of weather tasks is that each computation is based on the results of the previous one so little if anything to be gained by running on GPU. The same logic means they wouldn't scale well as multi-core tasks.
[Nov 1, 2019 8:47:52 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Eric_Kaiser
Veteran Cruncher
Germany (Hessen)
Joined: May 7, 2013
Post Count: 1047
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: For Current Volunteers: Advance Information on Our Newest Project

WCG needs to fix that if they want this project to run quickly. Server abort should occur if ARP1 WU cannot be returned in 48 hours.

I don't second that. I have ARP1 that are now running 44 hours and still not finished. If the server aborts them energy and ressources are wasted.
In the end I am not going to support projects that waste ressources.
----------------------------------------

[Nov 1, 2019 8:49:39 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 164   Pages: 17   [ Previous Page | 2 3 4 5 6 7 8 9 10 11 | Next Page ]
[ Jump to Last Post ]
Post new Thread