Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 100
|
![]() |
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I modified the app config file to only run 6 MIP units instead of 24. Runtimes dropped from a little over 5 hours per WU to between 2.5 and 2.8 hours. I was able to get the runtime under 2 hours by continuing to reduce the number of concurrent MIP WUs. Tried the same experiment on an AMD 8 core system (no hyperthreading) and saw the same result just not to the same extent. Runtimes on the AMD dropped from about 3.5 hours to about 2. The more you run concurrent the worse it is. On my 32 core system, the runtimes are up over 7 hours per WU. Something is definitely wrong with these WUs.
----------------------------------------Efficiency on all work units remained at 99.9+. [Edit 1 times, last edit by Doneske at Sep 16, 2017 10:54:56 PM] |
||
|
PowerFactor
Ace Cruncher Joined: Dec 9, 2016 Post Count: 4029 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hey Doneske, what is the name of the config file you mentioned?
----------------------------------------[Edit 1 times, last edit by thepeacemaker7 at Sep 16, 2017 11:43:38 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
app_config.xml
|
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7697 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
2600k - 8 CPU (4 cores hyper-threading) 8M L2 cache - assuming Hyper-threading turned on. Dual Xeon E5410 - 8 CPU (4+4 - no Hyper-threading available) 24M L2 cache (12+12) I modified the app config file to only run 6 MIP units instead of 24. Runtimes dropped from a little over 5 hours per WU to between 2.5 and 2.8 hours. I was able to get the runtime under 2 hours by continuing to reduce the number of concurrent MIP WUs. Tried the same experiment on an AMD 8 core system (no hyperthreading) and saw the same result just not to the same extent. Runtimes on the AMD dropped from about 3.5 hours to about 2. The more you run concurrent the worse it is. On my 32 core system, the runtimes are up over 7 hours per WU. Something is definitely wrong with these WUs. Efficiency on all work units remained at 99.9+. I spoke with a person who knows way more about the innards of computers than I do. I described the situation and they almost immediately told me the difference is probably attributable to the amount of cache available. With the large memory requirements of MIP, a greater amount of cache is necessary to avoid the cache misses and have to go retrieve information from memory which is a lot slower. So it appears cpus with larger cache sizes are going to be more efficient for the MIP project, especially on multi cpu systems. So Doneske, I don't think there is any problem with the work units per se, but they are taxing the hardware in a different fashion than we have seen before. If anyone else has a different theory please chime in. Cheers
Sgt. Joe
----------------------------------------*Minnesota Crunchers* [Edit 1 times, last edit by Sgt.Joe at Sep 17, 2017 3:24:08 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I agree that there seems to be an issue "on-chip" and what you describe seems plausible but I would also submit there is probably something that could be done about it. We have had other projects with large memory requirements that didn't show this same behavior. I'm just wondering if this code was written more for a HPC environment where WUs would be distributed across nodes instead of across cores where concurrency would be less of an issue. They may feel that changing and testing the code is not worth the performance gain. If 5 to 7 hours per WU vs 1 to 2 hours is sufficient for the science then who am I to complain.
|
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7697 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
If 5 to 7 hours per WU vs 1 to 2 hours is sufficient for the science then who am I to complain. You have every right to complain. They are your machines and your electricity. If there is an issue which can be alleviated by better programming practice, perhaps it should at least be examined. If nothing else, at least a note in the FAQ's about the issue and what might be best practice for more efficient operation. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
Mumak
Senior Cruncher Joined: Dec 7, 2012 Post Count: 477 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I can see the same effect on a Threadripper system - when running sole MIP1 units, runtimes were 2-3 times longer than running mixed with other projects.
----------------------------------------Moreover an even more odd observation - on a Xeon E3-1275 v5 system the machine started to completely hang when running sole MIP1 units. And this is a specially tuned DELL workstation. It was running 24/7 without any problem in the past. Now switched to a mix of MIP1 and SCC1 and it runs stable. ![]() |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1323 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Cause your observations, let's have a look too:
![]() I had 30 MIPs running concurrently. Suspended 5 where the most tasks had 53.333% done and so 5 FAHV's were started. Within 20 minutes the Time Left of the running MIPs went down from 2hr37m to 1hr30m - 1hr52m |
||
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1679 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I thank you all for your feedbacks which confirm my initial observations.
----------------------------------------Even if a science is really demanding in terms of memory use, it should be possible for the software developers to take such needs and limitations into account (and to re-conciliate those): it is the art of software development. I am just a little bit surprised that such observations have not been made before the science was launched. Just for the record: currently I have only one host crunching for MIP - i7 4770K on Windows 7 Pro x64 - and the current average duration per WU increased from about 2 hours at the project launch up to 5 hours currently, with some times about 45 minutes difference between elapsed time and CPU time. Cheers, Yves |
||
|
der_Day
Cruncher Joined: Feb 14, 2008 Post Count: 2 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
can anyone say something about the checkpoints? 1 checkpoint after <10minutes and the next after another hour?! I crunch with a private computer, so I have to suspend it or are there other tricks? I think the checkpoints are hard coded into the application, so there is probably nothing you can do to change when they occur.(Someone correct me if I am wrong.) You can check all the projects and run just ones that have checkpoints which occur at regular short intervals. Cheers thanks for your answer |
||
|
|
![]() |