World Community Grid - View Thread - [RENAMED] Some concerns regarding the HCC project (page fault and poor performance, in particular (but not only) by multi-core hosts; #cores>2)

World Community Grid Forums

Category: Completed Research

Forum: Help Conquer Cancer

Thread: [RENAMED] Some concerns regarding the HCC project (page fault and poor performance, in particular (but not only) by multi-core hosts; #cores>2)

Quick Go »

No member browsing this thread

Thread Status: Locked
Total posts in this thread: 210

[ ]

Author

This topic has been viewed 20192 times and has 209 replies

KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1679
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

180 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

5 year badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

10 year badge for Help Fight Childhood Cancer

2 year badge for Influenza Antiviral Drug Search

20 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

100 year badge for Mapping Cancer Markers

10 year badge for Uncovering Genome Mysteries

50 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

10 year badge for Microbiome Immunity Project

20 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: Some concerns regarding the granted points

Thank you for the advice !
I had to turn off the privacy configuration of my firewall for seeing "pencil&paper" (cookie related rules).
Does the thread title now better reflect the subject of this discussion ?

----------------------------------------

Décrypthon team progress - KerSamson's contribution

[Dec 1, 2007 11:39:10 AM]

Movieman
Veteran Cruncher
Joined: Sep 9, 2006
Post Count: 1042
Status: Offline


Re: Some concerns regarding the granted points

Unfortunately your link has restricted rights to view so we are unable to see, however if you knew about this on Nov 2, then why did you make an initial reply to this thread saying there isn't a problem, when there clearly is confused

I didn't link the credit complaints to the pagefault issue immediately. The pagefault issue had already been reported in private, and also discussed in another public thread at length. It was hard to miss.

I said it wasn't a problem because the credit system was, in fact, working perfectly. The initial anomalies were well within the normal variance for unit runtimes. I'm afraid people have cried "wolf" so frequently over credit claims, that I wait until there is reliable evidence before taking them seriously.

So... a timeline: I spotted it on 2nd November. brent1023 spotted it on the 14th of November. BuHHunyx on the 23rd. Movieman was complaining of an unknown HCC problem on the 26th, but it was Sekerob who linked the issues, also on the 26th.

So now you know.

We all like credits. Please don't infer inference when no inference exists....

The link I gave was to a private forum. So, I dug through the archives and found this post on the 3rd: http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=16902#135613 - I explained my methodology, my observations, and that I had shunted it over to the techs.

Credit also to tekennelly who first discovered the pagefault issue - not in HCC, but in HPF2. The problem is less pronounced there.

Yes, I posted here on the 26th. I wrote WCG directly on it on the 13th and received a reply back from "kevin" on the 15th.
I did that intentionally so as to bring this to peoples attention quietly and not get into a credit discussion as they generally go downhill fast.
We'd been discussing this issue at Xs for at least a week prior to the 15th before I decided to write WCG. I wanted to make sure I wasn't the only one seeing this.
We have app 30-8 core clover machines at Xs that generally do app 700,000-750,000 points per day so you can understand why this was a major concern to us.
Anyway, I just wanted to clear that up.
Lets get back to work and good luck to the techs working on the Linux issue.

----------------------------------------

[Dec 1, 2007 2:40:55 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Some concerns regarding the granted points

I started, crunching years ago, with UD - Phase I, hoping to "fight Cancer".

I was really looking forward to another Cancer project here with WCG. On the other hand, I hate having my machines waste their time, taking way to long on HCC units. I have two dual Clovertown rigs and it was sad to see how long an HCC WU took on them.

So until I hear that the HCC issues are over and it is "all clear", by more than a few people, I'm done with them. Back to FAAH and Dengue, for me; I changed my Project profile and excluded HCC.

[Dec 1, 2007 2:53:41 PM]

JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3715
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy

1 year badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

180 day badge for The Clean Energy Project

1 year badge for Help Fight Childhood Cancer

180 day badge for Influenza Antiviral Drug Search

10 year badge for Help Cure Muscular Dystrophy - Phase 2

90 day badge for Discovering Dengue Drugs - Together - Phase 2

180 day badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

180 day badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

5 year badge for Microbiome Immunity Project

180 day badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Some concerns regarding the granted points

Hello Yves!

Does the thread title now better reflect the subject of this discussion ?

It is certainly far better than the initial one and probably closer to the truth, but the reference to multi-core PCs might put the techs on a false track.
As reported in my post
JmBoullier - Nov 29, 2007 12:46:40 AM
this problem is not limited to multi-core machines as far as I can observe.
And to make sure my HT processor was not behaving like a multi-core machine I have also ensured that the HCC WU was alone doing something meaningful when I watched.

Don't feel obliged to change your thread title again as it seems that the problem is more visible on multi-core machines but I wanted to make sure the techs will know it is not limited to them when they can move to this problem.

Cheers. Jean.

----------------------------------------

Team--> Decrypthon -->Statistics/Join -->Thread

[Dec 1, 2007 4:54:05 PM]

KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1679
Status: Offline
Project Badges:


Re: Some concerns regarding the granted points

Hi Jean,
you're right, I will try to find a better formulation in order to not induce people on the false way.
I hope everything is going well by you, since news from Decrypthon team are rare currently ...
Have a nice week-end,

----------------------------------------

Décrypthon team progress - KerSamson's contribution

[Dec 1, 2007 6:10:10 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Some concerns regarding the granted points

I think to clarify things further, you should edit the title one more time, as it says you have "Some concerns regarding HCC" within it.

You should make it clear that this has nothing whatsoever to do with Hamilton Caving Club (NZ), or Michigans Hubcap Collecters Club, but especially has nothing to do with meat promotion or production in Wales.

[Dec 1, 2007 7:03:47 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Some concerns regarding the granted points

I suppose I'll dip my toe into the water again. Brrr. . . it's chilly!

A page fault occurs when the core wants to access memory that is not loaded into cache. This will slow things down because the kernel will have to load a new page from memory into the cache, while the core waits. Any application with a lot of page faults will run more slowly than one with only a few. But there is a second possible problem with performance. Multiple cores can 'queue up' a series of page faults so that each core has to wait until its own page fault gets serviced. This is called memory contention. If a number of cores are running applications with a high number of page faults, then performance will drop even more because of this memory contention.

How can this performance inefficiency be cured? The normal way is to run a preprocessing step over the data arrays and produce a new array that clusters data together the way that the program will access it. Sometimes this is possible. Unfortunately, sometimes it is not. It all depends on the algorithm. Even when it is possible, it produces unreadable data structures. This need not be a problem but when developing a new program that has to be rapidly changed to match research needs it is almost always a problem.

[A personal reminiscence. A generation ago I spotted a neat 15-25 line section in an image processing assembler routine that I could optimize to speed up the program by 10%-15%. Even with paperwork, this change only cost me 2 or 3 days and we were running it constantly on a number of computers, so I considered it time well spent. I actually congratulated myself about this. (sob..) A little more than 2 years later the new computers changed the cache organization and I suddenly realized that my change was bound to cause problems down the road if the cache changed even more. After thinking it over for an hour, I eliminated the change. Programming to meet specific cache designs is very dangerous practice that has to be considered very suspiciously.]

So what is my estimate of the situation? I don't think that it makes sense to reprogram the application for this. The project scientists should be concentrating on the results and overworking the programmers to change the application to produce better results. Faster should be ignored at this stage.

But how should individual members of the World Community Grid feel about this? The high page-fault count is simply an artifact of the algorithm. It will slow down the flops/second but that will not matter as such. The CPU time spent running the kernel to load in new pages will show up as reduced credit, but for a single core the points impact should not be substantial. Memory contention will be much more substantial, so 4 and 8 core machines would show a much greater drop in points if running more than 1 HCC work unit. The WCG scheduler is sending out the HCC work units so a member can eliminate HCC from these multi-core machines without slowing down progress on HCC. And they could then run other projects such as FAAH and DDT that would otherwise have to run on the single core computers that can handle HCC with the greatest efficiency.

An unrelated note. Some days ago someone posted a work unit awarded 8.3 points in this thread. This was immediately reported to the WCG staff. I don't know what went wrong and we have a number of more urgent issues, but it is an error unrelated to the main problem being addressed in this thread.

Lawrence

[Dec 1, 2007 9:07:24 PM]

twilyth
Master Cruncher
US
Joined: Mar 30, 2007
Post Count: 2130
Status: Offline
Project Badges:

20 year badge for Human Proteome Folding - Phase 2

2 year badge for Nutritious Rice for the World

45 day badge for The Clean Energy Project

5 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

5 year badge for Help Cure Muscular Dystrophy - Phase 2

45 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

10 year badge for Drug Search for Leishmaniasis

10 year badge for GO Fight Against Malaria

10 year badge for Computing for Sustainable Water

20 year badge for Uncovering Genome Mysteries

50 year badge for FightAIDS@Home - Phase 2

50 year badge for Microbiome Immunity Project

90 day badge for Africa Rainfall Project


Re: Some concerns regarding the granted points

Thanks LH - good enough for me.

I just turned off HCC for my profile and I'll share your post with our team.

----------------------------------------

[Dec 1, 2007 10:27:28 PM]

KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1679
Status: Offline
Project Badges:


Re: Some concerns regarding the granted points

Many thanks Lawrence for these clear explanations.
Indeed after the problem has been identified, I think that many of us have deselected the HCC project for their multicore-machines.
However this particular behavior is curious especially because it not occurs with this dimension for other projects.
I agree with you regarding the danger caused by particular software optimizations which would have been initiated for fitting specific hardware constraints.
I wish you large success by further supporting WCG.
Cheers,

----------------------------------------

Décrypthon team progress - KerSamson's contribution

[Dec 1, 2007 10:49:51 PM]

JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3715
Status: Offline
Project Badges:


Re: Some concerns regarding the granted points

Lawrence,
Thank you for your explanations, and I agree with you that designing programs to fit particular hardware or system internal design is generally not a good choice, specially in a multiplatform environment.

However there are program design choices which are inefficient in all environments and I think that when there are less emergencies to deal with it will be good that a senior programmer makes sure that there is not such a problem somewhere in the program, particularly in parts of the code which are at the heart of the many loops that such programs probably contain.

This is not saying that the programmers of HCC are not good, it is only saying that everybody can miss something sometime.

Cheers. Jean.

----------------------------------------

Team--> Decrypthon -->Statistics/Join -->Thread

[Dec 2, 2007 2:00:13 AM]

[ ]