Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 100
Posts: 100   Pages: 10   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 18327 times and has 99 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU Characteristics

I modified the app config file to only run 6 MIP units instead of 24. Runtimes dropped from a little over 5 hours per WU to between 2.5 and 2.8 hours. I was able to get the runtime under 2 hours by continuing to reduce the number of concurrent MIP WUs. Tried the same experiment on an AMD 8 core system (no hyperthreading) and saw the same result just not to the same extent. Runtimes on the AMD dropped from about 3.5 hours to about 2. The more you run concurrent the worse it is. On my 32 core system, the runtimes are up over 7 hours per WU. Something is definitely wrong with these WUs.
Efficiency on all work units remained at 99.9+.
----------------------------------------
[Edit 1 times, last edit by Doneske at Sep 16, 2017 10:54:56 PM]
[Sep 16, 2017 10:51:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
PowerFactor
Ace Cruncher
Joined: Dec 9, 2016
Post Count: 4029
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WU Characteristics

Hey Doneske, what is the name of the config file you mentioned?
----------------------------------------
[Edit 1 times, last edit by thepeacemaker7 at Sep 16, 2017 11:43:38 PM]
[Sep 16, 2017 11:43:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU Characteristics

app_config.xml
[Sep 17, 2017 1:46:04 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7697
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WU Characteristics

2600k - 8 CPU (4 cores hyper-threading) 8M L2 cache - assuming Hyper-threading turned on.
Dual Xeon E5410 - 8 CPU (4+4 - no Hyper-threading available) 24M L2 cache (12+12)

I modified the app config file to only run 6 MIP units instead of 24. Runtimes dropped from a little over 5 hours per WU to between 2.5 and 2.8 hours. I was able to get the runtime under 2 hours by continuing to reduce the number of concurrent MIP WUs. Tried the same experiment on an AMD 8 core system (no hyperthreading) and saw the same result just not to the same extent. Runtimes on the AMD dropped from about 3.5 hours to about 2. The more you run concurrent the worse it is. On my 32 core system, the runtimes are up over 7 hours per WU. Something is definitely wrong with these WUs. Efficiency on all work units remained at 99.9+.

I spoke with a person who knows way more about the innards of computers than I do. I described the situation and they almost immediately told me the difference is probably attributable to the amount of cache available. With the large memory requirements of MIP, a greater amount of cache is necessary to avoid the cache misses and have to go retrieve information from memory which is a lot slower. So it appears cpus with larger cache sizes are going to be more efficient for the MIP project, especially on multi cpu systems. So Doneske, I don't think there is any problem with the work units per se, but they are taxing the hardware in a different fashion than we have seen before. If anyone else has a different theory please chime in.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
----------------------------------------
[Edit 1 times, last edit by Sgt.Joe at Sep 17, 2017 3:24:08 AM]
[Sep 17, 2017 3:23:02 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU Characteristics

I agree that there seems to be an issue "on-chip" and what you describe seems plausible but I would also submit there is probably something that could be done about it. We have had other projects with large memory requirements that didn't show this same behavior. I'm just wondering if this code was written more for a HPC environment where WUs would be distributed across nodes instead of across cores where concurrency would be less of an issue. They may feel that changing and testing the code is not worth the performance gain. If 5 to 7 hours per WU vs 1 to 2 hours is sufficient for the science then who am I to complain.
[Sep 17, 2017 1:30:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7697
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WU Characteristics

If 5 to 7 hours per WU vs 1 to 2 hours is sufficient for the science then who am I to complain.

You have every right to complain. They are your machines and your electricity. If there is an issue which can be alleviated by better programming practice, perhaps it should at least be examined. If nothing else, at least a note in the FAQ's about the issue and what might be best practice for more efficient operation.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Sep 17, 2017 2:17:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mumak
Senior Cruncher
Joined: Dec 7, 2012
Post Count: 477
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WU Characteristics

I can see the same effect on a Threadripper system - when running sole MIP1 units, runtimes were 2-3 times longer than running mixed with other projects.

Moreover an even more odd observation - on a Xeon E3-1275 v5 system the machine started to completely hang when running sole MIP1 units. And this is a specially tuned DELL workstation. It was running 24/7 without any problem in the past. Now switched to a mix of MIP1 and SCC1 and it runs stable.
----------------------------------------

[Sep 17, 2017 2:52:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1323
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WU Characteristics

Cause your observations, let's have a look too:


I had 30 MIPs running concurrently. Suspended 5 where the most tasks had 53.333% done and so 5 FAHV's were started.
Within 20 minutes the Time Left of the running MIPs went down from 2hr37m to 1hr30m - 1hr52m
[Sep 17, 2017 5:45:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1679
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WU Characteristics

I thank you all for your feedbacks which confirm my initial observations.
Even if a science is really demanding in terms of memory use, it should be possible for the software developers to take such needs and limitations into account (and to re-conciliate those): it is the art of software development.
I am just a little bit surprised that such observations have not been made before the science was launched.
Just for the record: currently I have only one host crunching for MIP - i7 4770K on Windows 7 Pro x64 - and the current average duration per WU increased from about 2 hours at the project launch up to 5 hours currently, with some times about 45 minutes difference between elapsed time and CPU time.
Cheers,
Yves
----------------------------------------
[Sep 17, 2017 6:17:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
der_Day
Cruncher
Joined: Feb 14, 2008
Post Count: 2
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WU Characteristics

can anyone say something about the checkpoints? 1 checkpoint after <10minutes and the next after another hour?! I crunch with a private computer, so I have to suspend it or are there other tricks?

I think the checkpoints are hard coded into the application, so there is probably nothing you can do to change when they occur.(Someone correct me if I am wrong.) You can check all the projects and run just ones that have checkpoints which occur at regular short intervals.
Cheers

thanks for your answer
[Sep 18, 2017 2:14:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 100   Pages: 10   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread