Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 100
|
![]() |
Author |
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Surely if you can plot temps over time, you can do same with the CPU GHZ, or? (Mine is locked to 1.6 Ghz, and temp is constant running MIP1 only, no different to what it shows for OET1 and ZIKA)
----------------------------------------[Edit 1 times, last edit by SekeRob* at Sep 10, 2017 9:24:46 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Since an I7-6770K is a 4 core processor, it would be interesting to see the temps of the other 3 cores. Sensors command on my Linux 16.04 with an I7 shows different temperatures for each core. Maybe that one core is lower for some reason. You may not be able to correlate one core temp with the utilization graph, as I assume that is the utilization average for ALL cores. A more useful plot might be core 0 temp versus core 0 utilization. Maybe that core is being adjusted down by the governor but the others aren't. I have noticed a slight decrease in throughput on my Linux machines, independent of any WCG project, that I think is due to changes they have made to the cpufreq facility. My governor used to be set to Performance but now it is set to Ondemand which is now the default on Ubuntu. The developers maintain that there shouldn't be any real difference between the two on a machine that is fully utilized but I'm not so sure.
|
||
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1677 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi SekeRob,
----------------------------------------unfortunately, I do not plot the CPU frequency. However, the plot of the used memory is available, see below: ![]() If the CPU has to wait regularly on the memory (for example because of recurrent cache faults), the CPU load would remain at 100%, even if the CPU is waiting (waiting is not equal idle), but, because waiting is less demanding than crunching, the CPU temperature will drop down. I don't know how the science software is designed and implemented, nevertheless, my feeling is that something could probably be optimised on software level. Comparing the CPU temperature with the memory load, my guess is that there is a possible cause for the crazy MIP behaviour on Linux. If not MIP, it is a library or a kernel problem on Linux level (Linux 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux). Cheers, Yves |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Doneske's observation is interesting, considering that my hottest core(0) is 62C and the same time the coolest core(3) = 55C.
----------------------------------------Memory leaking would last as long as the task runs, then cleanup at end and new task. Using PSensor seeing nothing of the sort of continuous rising... steady and some down and up as tasks end and start. I've pushed 4.13 kernel in few days ago. 16.04 LTS was recently upgraded to 16.04.02, still 4.4 kernel. https://www.networkworld.com/article/3221422/...as-a-long-shelf-life.html , but only for server ATM. Mine still runs 16.04.01 but tyhere's a 16.04.02 and seemingly a 16.04.03 due out soon http://www.omgubuntu.co.uk/2017/02/download-ubuntu-16-04-2-lts going to kernel 4.8 [Edit 2 times, last edit by SekeRob* at Sep 10, 2017 4:08:15 PM] |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7697 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I am switched over a Linux system from running SCC to MIP and noticed no difference in temperatures. Psensor is giving me temps of 37C to 44C on a system which runs 14 to 16 hours per day.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1677 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
@Doneske
----------------------------------------I've just saw your remark concerning the core temperature. Indeed the 4 cores show the same behaviour (the host was fully devoted to MIP or to OET1). The temperature difference between the cores at 100% CPU load is less than 0.5 °C, i.e. 61.5 to 62 °C. I monitor 4 of my hosts using LibreNMS with locally running snmp agents: view systemonly included .1.3.6.1.2.1.1 Cheers, Yves |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Understood. The part I can't reconcile is the temp of the core going down and utilization staying the same (everything else being equal such as fan speed, ambient temp etc). Was your utilization graph only for core 0 or was it total utilization across all cores? I could understand one core dropping in temp but would expect it's utilization to also drop. while the other cores were not impacted. In that case, maybe MIP is causing some sort of on chip cache pollution that only effects one core. Whereas OET and other apps don't do that. I believe there tools out there to analyze the cache but that is not something I would care to undertake. If you have multiple machines with different processor models, do those other machines suffer the same malady? Maybe it is only related to a particular processor model or family. I had an E5560 that would not run correctly after Ubuntu 15.04. With 16 FAH jobs running, some cores would run 60 percent, a few others would be 40 and the rest 100. E5550 was OK, E56xx series was also OK. Only the E5560. It wouldn't run on 15.04, 15.10, 16.04. Problem went away with 16.10. I suspect it was firmware for that model that was dropped in 15.04 and corrected in 16.10. I think I was able to tie it back to a specific ABI level of the 4.8 kernel.
----------------------------------------[Edit 2 times, last edit by Doneske at Sep 12, 2017 12:44:41 AM] |
||
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1677 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi Doneske,
----------------------------------------here is the temperature summary figure: ![]() I mentioned in a previous post my guess: If the CPU has to wait regularly on the memory (for example because of recurrent cache faults), the CPU load would remain at 100%, even if the CPU is waiting (waiting is not equal idle), but, because waiting is less demanding than crunching, the CPU temperature will drop down. I am currently rarely on site for business reasons and it impedes a little bit investigations on other hosts. However the impacted host is not an "exotic" configuration: i7 6700K, Asus Z170-K (with up-to-date firmware), 16GB RAM, up-to-date Ubuntu Mate 16.03 x64. At the other side, the only one Windows 7 host I have - i7 4770K, Win7 Pro x64 - seems to behave as expected. However, Win7 does not report the CPU temperature over snmp and I do not have any historical records of it. Cheers, Yves |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
In the second link I posted above on updating 16.04 LTS to 16.04.02 there's a command that also pushes hardware support improvements
----------------------------------------sudo apt install --install-recommends xserver-xorg-hwe-16.04 Ran it because ever since 16.04 my screen would cycle, when GUI loaded, to power off / power on after the designated delay time to lock screen. Remember, my kernel has been stepped up over time to now 4.13. After running the command, the problem is gone, and to my surprise it installed another kernel, 4.10, i.e. a fully updated 16.04.02 uses 4.10, but after boot I chose for 4.13 again, and the screen problem remained gone. edit: spell daemon [Edit 2 times, last edit by SekeRob* at Sep 12, 2017 7:50:55 AM] |
||
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1677 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi SekeRob,
----------------------------------------indeed I was surprised regarding the kernel versions. I performed two fresh installations of Ubuntu Mate 16.04 end of March and middle of April. The first machine remains on kernel 4.4.x until the second machine received kernel 4.10.x Even with regular updates, the first machine sticks on 4.4.x and it is the machine I mentioned in this discussion. I assumed that the reason for Ubuntu not updating the kernel was maybe related to some i7 bugs. For this reason, I did not force the kernel update until now. But I probably have to reconsider this point. Cheers, Yves |
||
|
|
![]() |