World Community Grid - View Thread - [RENAMED] Some concerns regarding the HCC project (page fault and poor performance, in particular (but not only) by multi-core hosts; #cores>2)

World Community Grid Forums

Category: Completed Research

Forum: Help Conquer Cancer

Thread: [RENAMED] Some concerns regarding the HCC project (page fault and poor performance, in particular (but not only) by multi-core hosts; #cores>2)

Quick Go »

No member browsing this thread

Thread Status: Locked
Total posts in this thread: 210

[ ]

Author

This topic has been viewed 20196 times and has 209 replies

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Some concerns regarding the granted points

Hi zombie67,

problem with the application

Say rather that the cache is under-utilized so that we are measuring the speed of accesses to main memory. More cores means more contention between them for the memory bus. It looks as though it is a bad idea to have more than 1 HCC work unit running on the same memory bus.

Lawrence

[Nov 27, 2007 7:17:54 PM]

zombie67 [MM]
Senior Cruncher
USA
Joined: May 26, 2006
Post Count: 228
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

90 day badge for Discovering Dengue Drugs - Together

90 day badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

45 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

180 day badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

1 year badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

10 year badge for Uncovering Genome Mysteries

10 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

10 year badge for Smash Childhood Cancer

10 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: Some concerns regarding the granted points

Hi zombie67,

problem with the application

Interesting. BURP has a similar problem. To address it, they limited the number of tasks sent to a machine at any given time. Perhaps something similar could be done with HCC? Limit it to (say) one at any given time? The rest of the cores could then be occupied by other sub-projects.

----------------------------------------

----------------------------------------
[Edit 1 times, last edit by zombie67 at Nov 27, 2007 8:11:17 PM]

[Nov 27, 2007 8:10:09 PM]

BobCat13
Senior Cruncher
Joined: Oct 29, 2005
Post Count: 295
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding - Phase 2

180 day badge for Nutritious Rice for the World

45 day badge for The Clean Energy Project

90 day badge for Influenza Antiviral Drug Search

90 day badge for Discovering Dengue Drugs - Together - Phase 2

90 day badge for The Clean Energy Project - Phase 2

1 year badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

10 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

2 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

1 year badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: Some concerns regarding the granted points

I don't think that will help. Over the weekend on my AMD X2 6000+, my WCG queue was HCC only. I tried running it with Sudoku, SIMAP, and Spinhenge one project at a time so WCG was only using one core. The excessive page faults were still happening no matter which project it was teamed with. I saw several HCC tasks go over 1 billion PFs before completing. That amount of PFs caused a 20-25% increase in crunching time. Other HCC tasks totaled less than 50 million PFs to completion and finished over an hour faster than the large PFs tasks.

My first thought was the L2 cache on Core2 is much larger than the AMD cpus, so maybe some array fits into the Core2 L2 but not in the AMD L2. But that wouldn't explain the Xeon problems.

[Nov 27, 2007 8:32:06 PM]

Movieman
Veteran Cruncher
Joined: Sep 9, 2006
Post Count: 1042
Status: Offline


Re: Some concerns regarding the granted points

Hi Movieman,

Think that what is discussed in the Page Fault thread is the cause. Not sure, but think if you stick to just doing HCC exclusively the averages will work their way up. A very steady benchmark is what promotes how the algorithm handles claims and quorum.

Regarding your line "It's just not logical that a machine that's used for many things will bench more consistent than one that is limited to a very defined area of work.". I know what you are saying, consistently too low (see next para)! I know from observation that the timing of that can cause huge sways in the values, particular discwriting, so one night got up and forced one.... its now always running at night on the 24-7 machine (Have to get up again after an upgrade crying

Heck am i a self confessed points addict or just moti to vated to help others get what drives their motivation wink

).

And this from the very secret Italian room.... set the BOINC.exe to the second highest priority. The program does very little until it does that once-per 5 day test and will run it with elevated attention. I repeat myself: I think it BS that the bench is impacted by runtime events. The credits are computed on CPU seconds and not wall-clock... think that suites your thinking or biggrin

End of the off-topic

Hi Sek:
It's frustrating. I'm seeing my top machine here that is running HCC take a 6000 point a day drop from 25K+ to 19K..Just not right and I've just pulled it from the HCC project and put it back on strictly FAAH and will ask that others with these 8 core machines do the same.
I'm not going to sit back calmly and have the machine put in a claim for 140 points on a 7 hour WU and then be awarded 1/2 of that..
Sorry, but thats pure BS in my opinion.
Look for yorself. This is on a 8 core clover running right now at 3083mhz and doing nothing but WCG:

Result Name Status Sent Time Time Due /
Return Time CPU Time (hours) Claimed/ Granted BOINC Credit
X0000039140823200410152303_ 0-- Valid 11/25/2007 07:48:45 11/25/2007 22:33:35 3.62 57.5 / 57.5
X0000039140823200410152303_ 1-- Valid 11/25/2007 07:48:40 11/28/2007 05:09:34 7.87 155.5 / 57.5 <-me

X0000039140873200409241129_ 0-- Valid 11/25/2007 06:49:24 11/26/2007 04:51:10 6.49 71.7 / 71.7
X0000039140873200409241129_ 1-- Valid 11/25/2007 06:48:25 11/28/2007 05:09:34 7.20 142.2 / 71.7 <-me

X0000039140498200409241136_ 0-- Valid 11/25/2007 06:33:32 11/25/2007 15:35:07 5.36 68.4 / 68.4
X0000039140498200409241136_ 1-- Valid 11/25/2007 06:31:33 11/28/2007 05:09:34 6.91 136.6 / 68.4<-me

X0000055211331200508301311_ 2-- Valid 11/25/2007 11:08:26 11/26/2007 03:34:35 4.20 64.5 / 64.5
X0000055211331200508301311_ 0-- Error 11/25/2007 04:58:47 11/25/2007 11:06:47 0.00 0.0 / 0.0 <-error
X0000055211331200508301311_ 1-- Valid 11/25/2007 04:58:11 11/27/2007 21:05:59 7.36 106.0 / 64.5<-me

X0000055071040200508232324_ 1-- Valid 11/25/2007 03:50:04 11/27/2007 21:05:59 8.28 119.4 / 61.9<-me
X0000055071040200508232324_ 0-- Valid 11/25/2007 03:49:13 11/25/2007 15:34:09 3.69 61.9 / 61.9

X0000055070181200508232338_ 0-- Valid 11/25/2007 03:23:12 11/25/2007 19:49:47 4.57 59.6 / 59.6
X0000055070181200508232338_ 1-- Valid 11/25/2007 03:22:04 11/27/2007 19:52:20 7.87 128.0 / 59.6<-me

Im looking into the future and wondering what will happen when we start seeing dual nehalems late in 2008 with 16 cores that also use an advanced form of hyperthreading and will run 32 WU at a time.
I think the answer for me is to just sit this one out and continue working on the AIDS project even though I'd prefer to work on the Cancer project.

Edit: Some more info for you:
Here's the last 7 days on the 2 clover machines.
The first is for my "slower" machine thats at 3000mhz right now.
Uses Win2K3 32 and boimc 5.10.20 32 bit
This machine has been doing nothing but FAAH WU:

Statistics Date Total Run Time
(y:d:h:m:s) Points
Generated Results Returned
11/27/2007 0:007:16:02:17 23,485 45
11/26/2007 0:007:12:42:34 23,538 46
11/25/2007 0:007:02:35:18 21,319 54
11/24/2007 0:008:09:49:36 25,462 47
11/23/2007 0:008:08:36:47 25,148 47
11/22/2007 0:007:11:36:32 22,753 40
11/21/2007 0:008:15:15:18 26,113 46

Now the faster machine with win2K3 64 bit and Boinc 5.10.20 64 bit
This machine does a mix of FAAH and HCC, and runs at 83mhz faster.
Statistics Date Total Run Time
(y:d:h:m:s) Points
Generated Results Returned
11/27/2007 0:007:18:00:35 15,964 28
11/26/2007 0:008:19:19:09 19,732 36
11/25/2007 0:007:21:44:10 19,604 39
11/24/2007 0:009:05:55:49 22,313 40
11/23/2007 0:006:17:06:08 20,510 42
11/22/2007 0:007:09:42:06 22,058 45
11/21/2007 0:008:01:48:19 24,123 46

To add to the numbers for you last March we did an analysis of this last machine. For the month of March it had a 99.38% uptime and averaged 25,238 points a day doing strictly FAAH WU.

These type of numbers are hard to interprete but do show trends.
The trend I was seeing told me it was time to cut loose of HCC till it is "fixed"..

----------------------------------------

----------------------------------------
[Edit 3 times, last edit by Movieman at Nov 28, 2007 6:03:50 AM]

[Nov 28, 2007 5:44:50 AM]

Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline


Re: Some concerns regarding the granted points

Can you check the HCC run times for the mix machine. Some wrote that 1 or 2 HCC concurrently do not generate the high PF and Delta. The runtime/claims and Credits might be better.

The C2D went berserk yesterday. 1.3 billion PFs and 200-225 thousand Delta almost constantly over the first half, but subsiding towards the later half. Points were fine as the quorum partner also had a hi run time.

The Q6600 continues soundly with 4 HCC producing zero delta.

Don't feel bad about this. If a machine is not able to perform efficiently on a particular task, make the hop. My P4 is highly inefficient on HPF2 (yeah i know), so it's not doing those and the C2D might come off the HCC if this keeps repeating (was the first time i had this on that box).

In any case is been highlighted in more than a few places, so looking forward to a solution or at least a recommendation to find the right machines for this. Hope the Linux bug finding sessions bring something to surface at the same time. If that problem is solved the productivity will substantially increase.

----------------------------------------

WCG

Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!

----------------------------------------
[Edit 1 times, last edit by Sekerob at Nov 28, 2007 10:58:05 AM]

[Nov 28, 2007 10:56:13 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Some concerns regarding the granted points

Here is an interesting result for this conversation:

Workunit Status

Project Name: Help Conquer Cancer
Created: 11/24/2007 14:18:27
Name: X0000052451057200507211308
Minimum Quorum: 2
Initial Replication: 2

Result Name Status Sent Time Time Due /
Return Time CPU Time (hours) Claimed/ Granted BOINC Credit
X0000052451057200507211308_ 1-- Valid 11/24/2007 20:43:34 11/25/2007 05:25:34 5.39 8.3 / 8.3
X0000052451057200507211308_ 0-- Valid 11/24/2007 20:42:28 11/28/2007 14:34:36 7.49 85.4 / 8.3

I normally get just over 10 per hour of work and this one was just over 1 point per hour of work. Just thought it was strange enough to point out.

----------------------------------------
[Edit 1 times, last edit by Former Member at Nov 28, 2007 3:02:06 PM]

[Nov 28, 2007 2:59:57 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Some concerns regarding the granted points

here is a listing of all of my valid results on 2 computers:

Result Name Device Name Status Sent Time Time Due /
Return Time CPU Time (hours) Claimed/ Granted BOINC Credit
X0000052451094200507211307_ 0-- Laptop Valid 11/24/2007 20:44:00 11/28/2007 14:34:46 5.48 56.6 / 56.6
X0000052451060200507211308_ 0-- Laptop Valid 11/24/2007 20:44:00 11/28/2007 14:34:46 5.66 58.5 / 59.2
X0000052451059200507211308_ 0-- Laptop Valid 11/24/2007 20:44:00 11/28/2007 14:34:46 5.78 59.7 / 59.7
X0000052451085200507211308_ 1-- Laptop Valid 11/24/2007 20:43:59 11/28/2007 14:34:46 5.60 57.9 / 57.7
X0000052451057200507211308_ 0-- Dan-PC Valid 11/24/2007 20:42:28 11/28/2007 14:34:36 7.49 85.4 / 8.3
X0000052451031200507211308_ 0-- Dan-PC Valid 11/24/2007 20:42:28 11/28/2007 14:34:36 8.16 93.1 / 99.6
X0000052451029200507211308_ 0-- Dan-PC Valid 11/24/2007 20:42:28 11/28/2007 03:50:56 7.63 86.8 / 90.9
X0000052451028200507211308_ 0-- Dan-PC Valid 11/24/2007 20:42:28 11/27/2007 23:48:04 7.55 85.4 / 85.4
X0000052451027200507211308_ 0-- Dan-PC Valid 11/24/2007 20:42:28 11/27/2007 19:15:10 8.60 97.9 / 87.0
X0000052450804200507211312_ 1-- Laptop Valid 11/24/2007 20:33:31 11/27/2007 19:24:44 5.77 59.4 / 59.4
X0000052120670200507262053_ 0-- Laptop Valid 11/24/2007 19:40:46 11/27/2007 17:34:28 5.47 56.3 / 58.9

[Nov 28, 2007 3:14:28 PM]

Movieman
Veteran Cruncher
Joined: Sep 9, 2006
Post Count: 1042
Status: Offline


Re: Some concerns regarding the granted points

Runtime is in red:
Result Name Status Sent Time Time Due /
Return Time CPU Time (hours) Claimed/ Granted BOINC Credit
X0000039140823200410152303_ 0-- Valid 11/25/2007 07:48:45 11/25/2007 22:33:35 3.62 57.5 / 57.5
X0000039140823200410152303_ 1-- Valid 11/25/2007 07:48:40 11/28/2007 05:09:34 7.87 155.5 / 57.5 <-me

X0000039140873200409241129_ 0-- Valid 11/25/2007 06:49:24 11/26/2007 04:51:10 6.49 71.7 / 71.7
X0000039140873200409241129_ 1-- Valid 11/25/2007 06:48:25 11/28/2007 05:09:34 7.20 142.2 / 71.7 <-me

X0000039140498200409241136_ 0-- Valid 11/25/2007 06:33:32 11/25/2007 15:35:07 5.36 68.4 / 68.4
X0000039140498200409241136_ 1-- Valid 11/25/2007 06:31:33 11/28/2007 05:09:34 6.91 136.6 / 68.4<-me

X0000055211331200508301311_ 2-- Valid 11/25/2007 11:08:26 11/26/2007 03:34:35 4.20 64.5 / 64.5
X0000055211331200508301311_ 0-- Error 11/25/2007 04:58:47 11/25/2007 11:06:47 0.00 0.0 / 0.0 <-error
X0000055211331200508301311_ 1-- Valid 11/25/2007 04:58:11 11/27/2007 21:05:59 7.36 106.0 / 64.5<-me

X0000055071040200508232324_ 1-- Valid 11/25/2007 03:50:04 11/27/2007 21:05:59 8.28 119.4 / 61.9<-me
X0000055071040200508232324_ 0-- Valid 11/25/2007 03:49:13 11/25/2007 15:34:09 3.69 61.9 / 61.9

X0000055070181200508232338_ 0-- Valid 11/25/2007 03:23:12 11/25/2007 19:49:47 4.57 59.6 / 59.6
X0000055070181200508232338_ 1-- Valid 11/25/2007 03:22:04 11/27/2007 19:52:20 7.87 128.0 / 59.6<-me

----------------------------------------

[Nov 28, 2007 5:40:00 PM]

JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3715
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy

1 year badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

180 day badge for The Clean Energy Project

1 year badge for Help Fight Childhood Cancer

180 day badge for Influenza Antiviral Drug Search

10 year badge for Help Cure Muscular Dystrophy - Phase 2

180 day badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

180 day badge for Drug Search for Leishmaniasis

5 year badge for Outsmart Ebola Together

5 year badge for Microbiome Immunity Project

180 day badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Some concerns regarding the granted points

If it can help the techs who work on this problem I have asked for a new HCC WU for confirming my memories of my initial test.
The page faults problem is not (or at least not only) caused by some kind of weird competition between several tasks running in multicore machines.
Making sure my HCC WU is alone doing something meaningful in my P4 HT machine I am still seeing an average 40,000 page faults per second. And this number does not change if I start another CPU intensive process. Since I have enough RAM (1.5 GB) there is no corresponding swapping activity on my disk, but this is certainly not free at the CPU level.

Good luck to the techs. Jean.

----------------------------------------

Team--> Decrypthon -->Statistics/Join -->Thread

[Nov 29, 2007 12:46:40 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Some concerns regarding the granted points

This result is way out of line - 8.3 points for 7.5 hrs work is just not acceptable - it would seem that the other machine is claiming abnormally low yet is being taken as the closer consistant machine when allocating the credit not talking

[Nov 29, 2007 1:59:17 AM]

[ ]