Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 100
Posts: 100   Pages: 10   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 18321 times and has 99 replies Next Thread
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU Characteristics - Linux - To investigate

Surely if you can plot temps over time, you can do same with the CPU GHZ, or? (Mine is locked to 1.6 Ghz, and temp is constant running MIP1 only, no different to what it shows for OET1 and ZIKA)
----------------------------------------
[Edit 1 times, last edit by SekeRob* at Sep 10, 2017 9:24:46 AM]
[Sep 10, 2017 9:24:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU Characteristics - Linux - To investigate

Since an I7-6770K is a 4 core processor, it would be interesting to see the temps of the other 3 cores. Sensors command on my Linux 16.04 with an I7 shows different temperatures for each core. Maybe that one core is lower for some reason. You may not be able to correlate one core temp with the utilization graph, as I assume that is the utilization average for ALL cores. A more useful plot might be core 0 temp versus core 0 utilization. Maybe that core is being adjusted down by the governor but the others aren't. I have noticed a slight decrease in throughput on my Linux machines, independent of any WCG project, that I think is due to changes they have made to the cpufreq facility. My governor used to be set to Performance but now it is set to Ondemand which is now the default on Ubuntu. The developers maintain that there shouldn't be any real difference between the two on a machine that is fully utilized but I'm not so sure.
[Sep 10, 2017 3:15:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1677
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WU Characteristics - Linux - To investigate

Hi SekeRob,
unfortunately, I do not plot the CPU frequency. However, the plot of the used memory is available, see below:

If the CPU has to wait regularly on the memory (for example because of recurrent cache faults), the CPU load would remain at 100%, even if the CPU is waiting (waiting is not equal idle), but, because waiting is less demanding than crunching, the CPU temperature will drop down.
I don't know how the science software is designed and implemented, nevertheless, my feeling is that something could probably be optimised on software level.
Comparing the CPU temperature with the memory load, my guess is that there is a possible cause for the crazy MIP behaviour on Linux.
If not MIP, it is a library or a kernel problem on Linux level (Linux 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux).
Cheers,
Yves
----------------------------------------
[Sep 10, 2017 3:33:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU Characteristics - Linux - To investigate

Doneske's observation is interesting, considering that my hottest core(0) is 62C and the same time the coolest core(3) = 55C.

Memory leaking would last as long as the task runs, then cleanup at end and new task. Using PSensor seeing nothing of the sort of continuous rising... steady and some down and up as tasks end and start.

I've pushed 4.13 kernel in few days ago. 16.04 LTS was recently upgraded to 16.04.02, still 4.4 kernel. https://www.networkworld.com/article/3221422/...as-a-long-shelf-life.html , but only for server ATM. Mine still runs 16.04.01 but tyhere's a 16.04.02 and seemingly a 16.04.03 due out soon http://www.omgubuntu.co.uk/2017/02/download-ubuntu-16-04-2-lts going to kernel 4.8
----------------------------------------
[Edit 2 times, last edit by SekeRob* at Sep 10, 2017 4:08:15 PM]
[Sep 10, 2017 4:04:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7697
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WU Characteristics - Linux - To investigate

I am switched over a Linux system from running SCC to MIP and noticed no difference in temperatures. Psensor is giving me temps of 37C to 44C on a system which runs 14 to 16 hours per day.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Sep 11, 2017 2:56:59 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1677
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WU Characteristics - Linux - To investigate

@Doneske
I've just saw your remark concerning the core temperature.
Indeed the 4 cores show the same behaviour (the host was fully devoted to MIP or to OET1). The temperature difference between the cores at 100% CPU load is less than 0.5 °C, i.e. 61.5 to 62 °C.
I monitor 4 of my hosts using LibreNMS with locally running snmp agents:
view systemonly included .1.3.6.1.2.1.1
view systemonly included .1.3.6.1.2.1.25.1
view systemonly included .1.3.6.1.2.1.1.5.0

Cheers,
Yves
----------------------------------------
[Sep 11, 2017 11:36:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU Characteristics - Linux - To investigate

Understood. The part I can't reconcile is the temp of the core going down and utilization staying the same (everything else being equal such as fan speed, ambient temp etc). Was your utilization graph only for core 0 or was it total utilization across all cores? I could understand one core dropping in temp but would expect it's utilization to also drop. while the other cores were not impacted. In that case, maybe MIP is causing some sort of on chip cache pollution that only effects one core. Whereas OET and other apps don't do that. I believe there tools out there to analyze the cache but that is not something I would care to undertake. If you have multiple machines with different processor models, do those other machines suffer the same malady? Maybe it is only related to a particular processor model or family. I had an E5560 that would not run correctly after Ubuntu 15.04. With 16 FAH jobs running, some cores would run 60 percent, a few others would be 40 and the rest 100. E5550 was OK, E56xx series was also OK. Only the E5560. It wouldn't run on 15.04, 15.10, 16.04. Problem went away with 16.10. I suspect it was firmware for that model that was dropped in 15.04 and corrected in 16.10. I think I was able to tie it back to a specific ABI level of the 4.8 kernel.
----------------------------------------
[Edit 2 times, last edit by Doneske at Sep 12, 2017 12:44:41 AM]
[Sep 12, 2017 12:41:40 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1677
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WU Characteristics - Linux - To investigate

Hi Doneske,
here is the temperature summary figure:

I mentioned in a previous post my guess:
If the CPU has to wait regularly on the memory (for example because of recurrent cache faults), the CPU load would remain at 100%, even if the CPU is waiting (waiting is not equal idle), but, because waiting is less demanding than crunching, the CPU temperature will drop down.

I am currently rarely on site for business reasons and it impedes a little bit investigations on other hosts. However the impacted host is not an "exotic" configuration: i7 6700K, Asus Z170-K (with up-to-date firmware), 16GB RAM, up-to-date Ubuntu Mate 16.03 x64.
At the other side, the only one Windows 7 host I have - i7 4770K, Win7 Pro x64 - seems to behave as expected. However, Win7 does not report the CPU temperature over snmp and I do not have any historical records of it.
Cheers,
Yves
----------------------------------------
[Sep 12, 2017 7:10:38 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU Characteristics - Linux - To investigate

In the second link I posted above on updating 16.04 LTS to 16.04.02 there's a command that also pushes hardware support improvements

sudo apt install --install-recommends xserver-xorg-hwe-16.04

Ran it because ever since 16.04 my screen would cycle, when GUI loaded, to power off / power on after the designated delay time to lock screen. Remember, my kernel has been stepped up over time to now 4.13. After running the command, the problem is gone, and to my surprise it installed another kernel, 4.10, i.e. a fully updated 16.04.02 uses 4.10, but after boot I chose for 4.13 again, and the screen problem remained gone.

edit: spell daemon
----------------------------------------
[Edit 2 times, last edit by SekeRob* at Sep 12, 2017 7:50:55 AM]
[Sep 12, 2017 7:48:47 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1677
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WU Characteristics - Linux - To investigate

Hi SekeRob,
indeed I was surprised regarding the kernel versions. I performed two fresh installations of Ubuntu Mate 16.04 end of March and middle of April.
The first machine remains on kernel 4.4.x until the second machine received kernel 4.10.x
Even with regular updates, the first machine sticks on 4.4.x and it is the machine I mentioned in this discussion. I assumed that the reason for Ubuntu not updating the kernel was maybe related to some i7 bugs. For this reason, I did not force the kernel update until now. But I probably have to reconsider this point.
Cheers,
Yves
----------------------------------------
[Sep 12, 2017 11:34:54 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 100   Pages: 10   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread