World Community Grid - View Thread

World Community Grid Forums

Category: Completed Research

Forum: Microbiome Immunity Project

Thread: WU Characteristics

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 100

[ ]

Author

This topic has been viewed 18321 times and has 99 replies

SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline


Re: WU Characteristics - Linux - To investigate

Surely if you can plot temps over time, you can do same with the CPU GHZ, or? (Mine is locked to 1.6 Ghz, and temp is constant running MIP1 only, no different to what it shows for OET1 and ZIKA)

----------------------------------------
[Edit 1 times, last edit by SekeRob* at Sep 10, 2017 9:24:46 AM]

[Sep 10, 2017 9:24:14 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: WU Characteristics - Linux - To investigate

Since an I7-6770K is a 4 core processor, it would be interesting to see the temps of the other 3 cores. Sensors command on my Linux 16.04 with an I7 shows different temperatures for each core. Maybe that one core is lower for some reason. You may not be able to correlate one core temp with the utilization graph, as I assume that is the utilization average for ALL cores. A more useful plot might be core 0 temp versus core 0 utilization. Maybe that core is being adjusted down by the governor but the others aren't. I have noticed a slight decrease in throughput on my Linux machines, independent of any WCG project, that I think is due to changes they have made to the cpufreq facility. My governor used to be set to Performance but now it is set to Ondemand which is now the default on Ubuntu. The developers maintain that there shouldn't be any real difference between the two on a machine that is fully utilized but I'm not so sure.

[Sep 10, 2017 3:15:19 PM]

KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1677
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

180 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

5 year badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

10 year badge for Help Fight Childhood Cancer

2 year badge for Influenza Antiviral Drug Search

20 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

100 year badge for Mapping Cancer Markers

10 year badge for Uncovering Genome Mysteries

50 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

10 year badge for Microbiome Immunity Project

20 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: WU Characteristics - Linux - To investigate

Hi SekeRob,
unfortunately, I do not plot the CPU frequency. However, the plot of the used memory is available, see below:

If the CPU has to wait regularly on the memory (for example because of recurrent cache faults), the CPU load would remain at 100%, even if the CPU is waiting (waiting is not equal idle), but, because waiting is less demanding than crunching, the CPU temperature will drop down.
I don't know how the science software is designed and implemented, nevertheless, my feeling is that something could probably be optimised on software level.
Comparing the CPU temperature with the memory load, my guess is that there is a possible cause for the crazy MIP behaviour on Linux.
If not MIP, it is a library or a kernel problem on Linux level (Linux 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux).
Cheers,
Yves

----------------------------------------

Décrypthon team progress - KerSamson's contribution

[Sep 10, 2017 3:33:58 PM]

SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline


Re: WU Characteristics - Linux - To investigate

Doneske's observation is interesting, considering that my hottest core(0) is 62C and the same time the coolest core(3) = 55C.

Memory leaking would last as long as the task runs, then cleanup at end and new task. Using PSensor seeing nothing of the sort of continuous rising... steady and some down and up as tasks end and start.

I've pushed 4.13 kernel in few days ago. 16.04 LTS was recently upgraded to 16.04.02, still 4.4 kernel. https://www.networkworld.com/article/3221422/...as-a-long-shelf-life.html , but only for server ATM. Mine still runs 16.04.01 but tyhere's a 16.04.02 and seemingly a 16.04.03 due out soon http://www.omgubuntu.co.uk/2017/02/download-ubuntu-16-04-2-lts going to kernel 4.8

----------------------------------------
[Edit 2 times, last edit by SekeRob* at Sep 10, 2017 4:08:15 PM]

[Sep 10, 2017 4:04:21 PM]

Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7697
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

14 day badge for Help Cure Muscular Dystrophy

2 year badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

90 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

45 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

5 year badge for GO Fight Against Malaria

200 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

100 year badge for Smash Childhood Cancer

2 year badge for Africa Rainfall Project

100 year badge for OpenPandemics - COVID-19


Re: WU Characteristics - Linux - To investigate

I am switched over a Linux system from running SCC to MIP and noticed no difference in temperatures. Psensor is giving me temps of 37C to 44C on a system which runs 14 to 16 hours per day.
Cheers

----------------------------------------

Sgt. Joe
*Minnesota Crunchers*

[Sep 11, 2017 2:56:59 AM]

KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1677
Status: Offline
Project Badges:


Re: WU Characteristics - Linux - To investigate

@Doneske
I've just saw your remark concerning the core temperature.
Indeed the 4 cores show the same behaviour (the host was fully devoted to MIP or to OET1). The temperature difference between the cores at 100% CPU load is less than 0.5 °C, i.e. 61.5 to 62 °C.
I monitor 4 of my hosts using LibreNMS with locally running snmp agents:

view systemonly included .1.3.6.1.2.1.1
view systemonly included .1.3.6.1.2.1.25.1
view systemonly included .1.3.6.1.2.1.1.5.0

Cheers,
Yves

----------------------------------------

Décrypthon team progress - KerSamson's contribution

[Sep 11, 2017 11:36:12 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: WU Characteristics - Linux - To investigate

Understood. The part I can't reconcile is the temp of the core going down and utilization staying the same (everything else being equal such as fan speed, ambient temp etc). Was your utilization graph only for core 0 or was it total utilization across all cores? I could understand one core dropping in temp but would expect it's utilization to also drop. while the other cores were not impacted. In that case, maybe MIP is causing some sort of on chip cache pollution that only effects one core. Whereas OET and other apps don't do that. I believe there tools out there to analyze the cache but that is not something I would care to undertake. If you have multiple machines with different processor models, do those other machines suffer the same malady? Maybe it is only related to a particular processor model or family. I had an E5560 that would not run correctly after Ubuntu 15.04. With 16 FAH jobs running, some cores would run 60 percent, a few others would be 40 and the rest 100. E5550 was OK, E56xx series was also OK. Only the E5560. It wouldn't run on 15.04, 15.10, 16.04. Problem went away with 16.10. I suspect it was firmware for that model that was dropped in 15.04 and corrected in 16.10. I think I was able to tie it back to a specific ABI level of the 4.8 kernel.

----------------------------------------
[Edit 2 times, last edit by Doneske at Sep 12, 2017 12:44:41 AM]

[Sep 12, 2017 12:41:40 AM]

KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1677
Status: Offline
Project Badges:


Re: WU Characteristics - Linux - To investigate

Hi Doneske,
here is the temperature summary figure:

I mentioned in a previous post my guess:

I am currently rarely on site for business reasons and it impedes a little bit investigations on other hosts. However the impacted host is not an "exotic" configuration: i7 6700K, Asus Z170-K (with up-to-date firmware), 16GB RAM, up-to-date Ubuntu Mate 16.03 x64.
At the other side, the only one Windows 7 host I have - i7 4770K, Win7 Pro x64 - seems to behave as expected. However, Win7 does not report the CPU temperature over snmp and I do not have any historical records of it.
Cheers,
Yves

----------------------------------------

Décrypthon team progress - KerSamson's contribution

[Sep 12, 2017 7:10:38 AM]

SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline


Re: WU Characteristics - Linux - To investigate

In the second link I posted above on updating 16.04 LTS to 16.04.02 there's a command that also pushes hardware support improvements

sudo apt install --install-recommends xserver-xorg-hwe-16.04

Ran it because ever since 16.04 my screen would cycle, when GUI loaded, to power off / power on after the designated delay time to lock screen. Remember, my kernel has been stepped up over time to now 4.13. After running the command, the problem is gone, and to my surprise it installed another kernel, 4.10, i.e. a fully updated 16.04.02 uses 4.10, but after boot I chose for 4.13 again, and the screen problem remained gone.

edit: spell daemon

----------------------------------------
[Edit 2 times, last edit by SekeRob* at Sep 12, 2017 7:50:55 AM]

[Sep 12, 2017 7:48:47 AM]

KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1677
Status: Offline
Project Badges:


Re: WU Characteristics - Linux - To investigate

Hi SekeRob,
indeed I was surprised regarding the kernel versions. I performed two fresh installations of Ubuntu Mate 16.04 end of March and middle of April.
The first machine remains on kernel 4.4.x until the second machine received kernel 4.10.x
Even with regular updates, the first machine sticks on 4.4.x and it is the machine I mentioned in this discussion. I assumed that the reason for Ubuntu not updating the kernel was maybe related to some i7 bugs. For this reason, I did not force the kernel update until now. But I probably have to reconsider this point.
Cheers,
Yves

----------------------------------------

Décrypthon team progress - KerSamson's contribution

[Sep 12, 2017 11:34:54 AM]

[ ]