Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Locked Total posts in this thread: 210
|
![]() |
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
boinc.exe and the application communicate using shared memory Just that the amount and frequency of communication between boinc.exe and the applications would not cause that much system calls. It is somewhere inside the application. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Highwire, thank you for your post.
However, I'm fairly confident the soft page faults are the cause of the kernel time (particularly since I profiled it). Your points are all valid, but a properly designed BOINC application shouldn't suffer at least from the shared memory issue, and I'm pretty confident the WCG techs would have spotted any locking problems. The excessive faulting alone is enough to explain the kernel time. But what causes the faulting? So far, the techs are guessing it's just the memory access pattern caused by the algorithm. That's kind of vague. However, this quote from greggm on MSDN gives some clues: Soft faults are relatively inexpensive. These occur when a process tries to access the virtual address, and the operating system can satisfy the request without reading in the page from disk. This can happen with pages that the program wants to be zero (called ‘demand zero’ pages), when a page is written to for the first time (‘copy on write’ pages) or if the page is already in memory somewhere else. The last situation occurs if a file is memory mapped into multiple processes or into multiple locations of the same process, and one of the other file references has already caused the data to be in physical memory. |
||
|
Highwire
Cruncher Joined: Aug 18, 2006 Post Count: 39 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The excessive faulting alone is enough to explain the kernel time. But what causes the faulting? So far, the techs are guessing it's just the memory access pattern caused by the algorithm. That's kind of vague. However, this quote from greggm on MSDN gives some clues: Soft faults are relatively inexpensive. These occur when a process tries to access the virtual address, and the operating system can satisfy the request without reading in the page from disk. This can happen with pages that the program wants to be zero (called ‘demand zero’ pages), when a page is written to for the first time (‘copy on write’ pages) or if the page is already in memory somewhere else. The last situation occurs if a file is memory mapped into multiple processes or into multiple locations of the same process, and one of the other file references has already caused the data to be in physical memory. I'd agree if its not actually going to disk it'll be *relatively* less bad, but the sheer quantity of them is having an effect. I looked at the app at one point and it was reporting something like 8 minutes in kernel out of 70 minutes, total, that's an awful lot of wasted time in the kernel. It's not that big a memory hog either. I just can't see how any 'normal' memory usage and code would do anything like this when there is so much system memory free. I've got apps here using more memory than HCC and they don't do it. And other crunching tasks don't do it. The only other time I see that much kernel usage is normally real disk activity, though I've noticed 3d games doing it which I put down to the weird 3D accelerator/bus activity etc. Obviously HCC won't be doing that! If you aren't explicitly calling Open/CreateFileMapping do you have a dll there with #pragma data_seg on some commonly used area or something? or any other 'wierd' attributes to the data? I would avoid that as I *believe* it creates an implicit shared memory area. If you aren't looking at boinc shared memory often I'm thinking it might be something like that. If there IS a shared memory thing somewhere, it might explain: the paging, the kernel time, and also as Windows will surely have to maintain the shared bit across cores might explain poor performance on multicore CPUs. So it might explain everything, which is why I'm flogging this particular horse :) [Edit 1 times, last edit by Highwire at Dec 14, 2007 7:14:59 PM] |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Following the HCC beta on Linux, assuming they are the full job, graphics and all, they did 3.38 hours per job on a total of 122, less than half of the average of 7.80 hours on windows/(mac component small). Looking forward if anything comes out of that.
----------------------------------------
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
Highwire
Cruncher Joined: Aug 18, 2006 Post Count: 39 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'm wondering ...
----------------------------------------I was just messing with an app I have source for that page faults a bit and noticed as I stepped through has little page fault blips when I allocate a not that big chunk of memory on the heap. I did a test of a little stupid loop (VC++), in a RELEASE version of a program a-la: for( int n = 0; n < 1000000; n++) { int * p = new int[rand() + 1]; // Page fault city //int * p = new int[100000]; // Faults if larger, this value ok. //int * p = new int[RAND_MAX]; // pretty much no faults. delete [] p; } of course this is a stupid loop, and of course the CPU maxes out, but the page faults can go absolutely loopy, 80% kernel usage. This code, while stupid, isnt' technically 'wrong' in that it doesn't leak. The program sits on a constant memory usage. I'd have thought it might realloc from the freed heap space, but it's only does this at lower constant values, which allocs and frees with little effort, at higher or random values, it's hammering the kernel. I'm guessing the random size means there is a heap fragmentation thing or something, even though it could fit in the same size as the static version. If anything else is touching the heap I wouldn't rely on a 'realloc' anyway. I'm wondering how a multicore system handles memory allocs as it will end up in the main memory sometime so it must be synchronised? Perhaps it's something like this. Perhaps you have somewhere in the algortithm an on paper 'correct' non leaky allocation and dellocation that could be moved outside some loop, made some suitable size and reused. It's an optimisation you might make anyway, but the above may be something to have a good look for! Let me know if any of this solves things. I've got me teeth into this now :) PS RAND_MAX will be 32767, not THAT much memory these days(x4 of course for the byte pedantic!). ADDENDUM I tried using old C straight malloc in a loop to see what happened, results were: // Loads CPU but no page faults void * p = malloc( RAND_MAX); void * p2 = malloc( RAND_MAX); Loads CPU but no page faults void * p = malloc( rand()); // Following pair 350,000 PF Delta, 75-80% kernel void * p = malloc( rand()); void * p2 = malloc( RAND_MAX); // (Changing p2 to rand() makes 300k PF, ~70% less kernel) free( p); // Each test had relelvant free()s free( p2); [Edit 3 times, last edit by Highwire at Dec 14, 2007 11:17:21 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Highwire is asking excellent questions. It wouldn't take much for someone with access to the project code and a performance profiler to identify the loop where so much time is being spent in kernel activity. Then he or she could evaluate the difficulty of making changes. Considering the number of high-powered PCs available to do more for the HCC project, I would think this activity could be given a higher priority.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi steveleg,
I would think this activity could be given a higher priority. HCC has been the highest priority since November. It's just that this particular problem is only one of several. Right now we are getting HCC up on Linux, which will give us a platform to compare with Windows.I like Highwire's approach because he is looking at possibilities that can be cured. My own thinking tends toward difficulties that are not so amenable, but I want to stay alert to more hopeful possibilities. It isn't necessary to try to alert the techs to this. They have been looking at it for weeks. Lawrence |
||
|
zombie67 [MM]
Senior Cruncher USA Joined: May 26, 2006 Post Count: 228 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Right now we are getting HCC up on Linux, which will give us a platform to compare with Windows. Another platform to compare to windows... Don't forget about OSX. ![]() |
||
|
Movieman
Veteran Cruncher Joined: Sep 9, 2006 Post Count: 1042 Status: Offline |
Hi steveleg, I would think this activity could be given a higher priority. HCC has been the highest priority since November. It's just that this particular problem is only one of several. Right now we are getting HCC up on Linux, which will give us a platform to compare with Windows.I like Highwire's approach because he is looking at possibilities that can be cured. My own thinking tends toward difficulties that are not so amenable, but I want to stay alert to more hopeful possibilities. It isn't necessary to try to alert the techs to this. They have been looking at it for weeks. Lawrence We're with you Lawrence. That tends to get lost at times but we're all after the same thing. Our best to the techs and they will get to this when they can. Reading some of the above and only comprehending some of it I wish I'd gotten more interested in programming than the hardware end.. ![]() ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Today 1/2 hour ago my HCC task was restarted somewhere around 95% (it was after reboot, thus not more in memory). First I've noticed that it is not pagefaulting at all (well, some dozen PFs per minute, not exactly, but...)
The "busy" phase (with ~50 000 PF/s) begun just 2 minutes later. Either the app was just rereading and storing checkpointed data first, or something later triggered the behavior. As noted previously, it was not the enlarged memory footprint - now at 96.9% both the WorkingSet and PrivateBytes are still around 42 Megs, still with 3/4 GB RAM free. |
||
|
|
![]() |