World Community Grid - View Thread - [RENAMED] Some concerns regarding the HCC project (page fault and poor performance, in particular (but not only) by multi-core hosts; #cores>2)

World Community Grid Forums

Category: Completed Research

Forum: Help Conquer Cancer

Thread: [RENAMED] Some concerns regarding the HCC project (page fault and poor performance, in particular (but not only) by multi-core hosts; #cores>2)

Quick Go »

No member browsing this thread

Thread Status: Locked
Total posts in this thread: 210

[ ]

Author

This topic has been viewed 18892 times and has 209 replies

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: [RENAMED] Some concerns regarding the HCC project (page fault and poor performance, in particular (but not only) by multi-core hosts; #cores>2)

boinc.exe and the application communicate using shared memory

Just that the amount and frequency of communication between boinc.exe and the applications would not cause that much system calls. It is somewhere inside the application.

[Dec 14, 2007 2:01:47 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: [RENAMED] Some concerns regarding the HCC project (page fault and poor performance, in particular (but not only) by multi-core hosts; #cores>2)

Highwire, thank you for your post.

However, I'm fairly confident the soft page faults are the cause of the kernel time (particularly since I profiled it).

Your points are all valid, but a properly designed BOINC application shouldn't suffer at least from the shared memory issue, and I'm pretty confident the WCG techs would have spotted any locking problems.

The excessive faulting alone is enough to explain the kernel time. But what causes the faulting? So far, the techs are guessing it's just the memory access pattern caused by the algorithm. That's kind of vague.

However, this quote from greggm on MSDN gives some clues:

Soft faults are relatively inexpensive. These occur when a process tries to access the virtual address, and the operating system can satisfy the request without reading in the page from disk. This can happen with pages that the program wants to be zero (called ‘demand zero’ pages), when a page is written to for the first time (‘copy on write’ pages) or if the page is already in memory somewhere else. The last situation occurs if a file is memory mapped into multiple processes or into multiple locations of the same process, and one of the other file references has already caused the data to be in physical memory.

[Dec 14, 2007 2:06:29 PM]

Highwire
Cruncher
Joined: Aug 18, 2006
Post Count: 39
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy

14 day badge for Discovering Dengue Drugs - Together

45 day badge for Nutritious Rice for the World

1 year badge for Help Fight Childhood Cancer

1 year badge for Help Cure Muscular Dystrophy - Phase 2

90 day badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

10 year badge for Mapping Cancer Markers

1 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

1 year badge for Microbiome Immunity Project

5 year badge for OpenPandemics - COVID-19


Re: [RENAMED] Some concerns regarding the HCC project (page fault and poor performance, in particular (but not only) by multi-core hosts; #cores>2)

The excessive faulting alone is enough to explain the kernel time. But what causes the faulting? So far, the techs are guessing it's just the memory access pattern caused by the algorithm. That's kind of vague.

However, this quote from greggm on MSDN gives some clues:

I'd agree if its not actually going to disk it'll be *relatively* less bad, but the sheer quantity of them is having an effect. I looked at the app at one point and it was reporting something like 8 minutes in kernel out of 70 minutes, total, that's an awful lot of wasted time in the kernel. It's not that big a memory hog either. I just can't see how any 'normal' memory usage and code would do anything like this when there is so much system memory free. I've got apps here using more memory than HCC and they don't do it. And other crunching tasks don't do it. The only other time I see that much kernel usage is normally real disk activity, though I've noticed 3d games doing it which I put down to the weird 3D accelerator/bus activity etc. Obviously HCC won't be doing that!

If you aren't explicitly calling Open/CreateFileMapping do you have a dll there with #pragma data_seg on some commonly used area or something? or any other 'wierd' attributes to the data? I would avoid that as I *believe* it creates an implicit shared memory area. If you aren't looking at boinc shared memory often I'm thinking it might be something like that. If there IS a shared memory thing somewhere, it might explain: the paging, the kernel time, and also as Windows will surely have to maintain the shared bit across cores might explain poor performance on multicore CPUs. So it might explain everything, which is why I'm flogging this particular horse :)

----------------------------------------
[Edit 1 times, last edit by Highwire at Dec 14, 2007 7:14:59 PM]

[Dec 14, 2007 7:03:51 PM]

Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline


Re: [RENAMED] Some concerns regarding the HCC project (page fault and poor performance, in particular (but not only) by multi-core hosts; #cores>2)

Following the HCC beta on Linux, assuming they are the full job, graphics and all, they did 3.38 hours per job on a total of 122, less than half of the average of 7.80 hours on windows/(mac component small). Looking forward if anything comes out of that.

----------------------------------------

WCG

Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!

[Dec 14, 2007 7:30:01 PM]

Highwire
Cruncher
Joined: Aug 18, 2006
Post Count: 39
Status: Offline
Project Badges:


Re: [RENAMED] Some concerns regarding the HCC project (page fault and poor performance, in particular (but not only) by multi-core hosts; #cores>2)

I'm wondering ...

I was just messing with an app I have source for that page faults a bit and noticed as I stepped through has little page fault blips when I allocate a not that big chunk of memory on the heap.

I did a test of a little stupid loop (VC++), in a RELEASE version of a program a-la:
for( int n = 0; n < 1000000; n++)
{
int * p = new int[rand() + 1]; // Page fault city
//int * p = new int[100000]; // Faults if larger, this value ok.
//int * p = new int[RAND_MAX]; // pretty much no faults.
delete [] p;
}
of course this is a stupid loop, and of course the CPU maxes out, but the page faults can go absolutely loopy, 80% kernel usage. This code, while stupid, isnt' technically 'wrong' in that it doesn't leak. The program sits on a constant memory usage. I'd have thought it might realloc from the freed heap space, but it's only does this at lower constant values, which allocs and frees with little effort, at higher or random values, it's hammering the kernel. I'm guessing the random size means there is a heap fragmentation thing or something, even though it could fit in the same size as the static version. If anything else is touching the heap I wouldn't rely on a 'realloc'
anyway. I'm wondering how a multicore system handles memory allocs as it will end up in the main memory sometime so it must be synchronised?

Perhaps it's something like this. Perhaps you have somewhere in the algortithm an on paper 'correct' non leaky allocation and dellocation that could be moved outside some loop, made some suitable size and reused. It's an optimisation you might make anyway, but the above may be something to have a good look for!

Let me know if any of this solves things. I've got me teeth into this now :)
PS RAND_MAX will be 32767, not THAT much memory these days(x4 of course for the byte pedantic!).

ADDENDUM
I tried using old C straight malloc in a loop to see what happened, results were:
// Loads CPU but no page faults
void * p = malloc( RAND_MAX);
void * p2 = malloc( RAND_MAX);

Loads CPU but no page faults
void * p = malloc( rand());

// Following pair 350,000 PF Delta, 75-80% kernel
void * p = malloc( rand());
void * p2 = malloc( RAND_MAX);
// (Changing p2 to rand() makes 300k PF, ~70% less kernel)

free( p); // Each test had relelvant free()s
free( p2);

----------------------------------------
[Edit 3 times, last edit by Highwire at Dec 14, 2007 11:17:21 PM]

[Dec 14, 2007 8:55:57 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: [RENAMED] Some concerns regarding the HCC project (page fault and poor performance, in particular (but not only) by multi-core hosts; #cores>2)

Highwire is asking excellent questions. It wouldn't take much for someone with access to the project code and a performance profiler to identify the loop where so much time is being spent in kernel activity. Then he or she could evaluate the difficulty of making changes. Considering the number of high-powered PCs available to do more for the HCC project, I would think this activity could be given a higher priority.

[Dec 14, 2007 10:23:03 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: [RENAMED] Some concerns regarding the HCC project (page fault and poor performance, in particular (but not only) by multi-core hosts; #cores>2)

Hi steveleg,

I would think this activity could be given a higher priority.

HCC has been the highest priority since November. It's just that this particular problem is only one of several. Right now we are getting HCC up on Linux, which will give us a platform to compare with Windows.

I like Highwire's approach because he is looking at possibilities that can be cured. My own thinking tends toward difficulties that are not so amenable, but I want to stay alert to more hopeful possibilities.

It isn't necessary to try to alert the techs to this. They have been looking at it for weeks.

Lawrence

[Dec 15, 2007 4:07:51 AM]

zombie67 [MM]
Senior Cruncher
USA
Joined: May 26, 2006
Post Count: 228
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

90 day badge for Discovering Dengue Drugs - Together

90 day badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

45 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

180 day badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

1 year badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

10 year badge for Uncovering Genome Mysteries

10 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

10 year badge for Smash Childhood Cancer

10 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: [RENAMED] Some concerns regarding the HCC project (page fault and poor performance, in particular (but not only) by multi-core hosts; #cores>2)

Right now we are getting HCC up on Linux, which will give us a platform to compare with Windows.

Another platform to compare to windows... Don't forget about OSX.

----------------------------------------

[Dec 15, 2007 5:00:01 AM]

Movieman
Veteran Cruncher
Joined: Sep 9, 2006
Post Count: 1042
Status: Offline


Re: [RENAMED] Some concerns regarding the HCC project (page fault and poor performance, in particular (but not only) by multi-core hosts; #cores>2)

Hi steveleg,

I would think this activity could be given a higher priority.

We're with you Lawrence.
That tends to get lost at times but we're all after the same thing.
Our best to the techs and they will get to this when they can.
Reading some of the above and only comprehending some of it I wish I'd gotten more interested in programming than the hardware end.. biggrin

----------------------------------------

[Dec 15, 2007 8:16:54 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: [RENAMED] Some concerns regarding the HCC project (page fault and poor performance, in particular (but not only) by multi-core hosts; #cores>2)

Today 1/2 hour ago my HCC task was restarted somewhere around 95% (it was after reboot, thus not more in memory). First I've noticed that it is not pagefaulting at all (well, some dozen PFs per minute, not exactly, but...)

The "busy" phase (with ~50 000 PF/s) begun just 2 minutes later. Either the app was just rereading and storing checkpointed data first, or something later triggered the behavior. As noted previously, it was not the enlarged memory footprint - now at 96.9% both the WorkingSet and PrivateBytes are still around 42 Megs, still with 3/4 GB RAM free.

[Dec 18, 2007 2:59:20 PM]

[ ]