Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Locked Total posts in this thread: 210
|
![]() |
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello PepoS,
I had better explain that a page fault starts in the cache on the CPU chip. A new page has to be loaded from main memory into the chip cache. At this point, the size of the main memory is irrelevant. A second problem occurs if the requested page is not in memory and has to be loaded up from disk. This is much slower but it is not (I think) the problem that HCC has. The size of the main memory would matter if this second problem occurred a lot. HCC can generate so many page faults that the CPU is slowed down by the memory bus, which cannot keep the cache filled. Lawrence |
||
|
Movieman
Veteran Cruncher Joined: Sep 9, 2006 Post Count: 1042 Status: Offline |
Hello PepoS, I had better explain that a page fault starts in the cache on the CPU chip. A new page has to be loaded from main memory into the chip cache. At this point, the size of the main memory is irrelevant. A second problem occurs if the requested page is not in memory and has to be loaded up from disk. This is much slower but it is not (I think) the problem that HCC has. The size of the main memory would matter if this second problem occurred a lot. HCC can generate so many page faults that the CPU is slowed down by the memory bus, which cannot keep the cache filled. Lawrence Could this be why the 8 core clovers are taking such a huge hit? FSB speeds are 1400 maximum with the DDR2-667FBDimms while a lot of the quads at Xs run FSB speeds up to 1800 and just about all are at 1600 using DDR2-800 ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Lawrence, I'm aware of both these sources of page faults. But I thought that CPU cache miss should not produce interrupts, processed with kernel calls, if the required blocks are already available in process' mapped RAM. Still, the amount of time spent in kernel calls I was observing was 25% - in my opinion these are not just core cache misses.
Anyway, I've got a silly idea to take a look at few HCC's task X0000054330932200508190032_1_0's call stack snapshots, whether there will be any interesting and repeating pattern. Take a look yourself (the snapshots were taken in approx. 20 second intervals, nothing exact, just take, copy and paste): ntkrnlpa.exe!KiDispatchInterrupt+0xa7 wcg_hcc1_img_5.15_windows_intelx86+0x12bb94 wcg_hcc1_img_5.15_windows_intelx86+0x19c5c wcg_hcc1_img_5.15_windows_intelx86+0x366fa wcg_hcc1_img_5.15_windows_intelx86+0x52f97 wcg_hcc1_img_5.15_windows_intelx86+0x52784 wcg_hcc1_img_5.15_windows_intelx86+0x104cfe --- ntkrnlpa.exe!KiDispatchInterrupt+0xa7 ntkrnlpa.exe!KiThreadStartup+0x16 NDIS.sys!ndisWorkerThread wcg_hcc1_img_5.15_windows_intelx86+0x1b1ce wcg_hcc1_img_5.15_windows_intelx86+0x19c5c wcg_hcc1_img_5.15_windows_intelx86+0x366fa wcg_hcc1_img_5.15_windows_intelx86+0x52f97 wcg_hcc1_img_5.15_windows_intelx86+0x52784 wcg_hcc1_img_5.15_windows_intelx86+0x104cfe --- etc. As you can see, the last 5 call stack addresses are repeating, so I'll omit the lowest (or highest) 4 of them in following snapshots: ntkrnlpa.exe!KiDispatchInterrupt+0xa7 ntkrnlpa.exe!KiThreadStartup+0x16 NDIS.sys!ndisWorkerThread wcg_hcc1_img_5.15_windows_intelx86+0x1b1ce wcg_hcc1_img_5.15_windows_intelx86+0x19c5c ... ntkrnlpa.exe!KiDispatchInterrupt+0xa7 ntkrnlpa.exe!MmAccessFault+0x11ae wcg_hcc1_img_5.15_windows_intelx86+0x12bea7 wcg_hcc1_img_5.15_windows_intelx86+0x19c5c ... ntkrnlpa.exe!KiDispatchInterrupt+0xa7 ntkrnlpa.exe!MmAccessFault+0x11ae wcg_hcc1_img_5.15_windows_intelx86+0x12bea7 wcg_hcc1_img_5.15_windows_intelx86+0x19c5c ... ntkrnlpa.exe!KiDispatchInterrupt+0xa7 ntkrnlpa.exe!MmAccessFault+0x11ae wcg_hcc1_img_5.15_windows_intelx86+0x12bea7 wcg_hcc1_img_5.15_windows_intelx86+0x19c5c ... ntkrnlpa.exe!KiDispatchInterrupt+0xa7 ntkrnlpa.exe!MmAccessFault+0x11ae wcg_hcc1_img_5.15_windows_intelx86+0x12bea7 wcg_hcc1_img_5.15_windows_intelx86+0x19c5c ... ntkrnlpa.exe!KiDispatchInterrupt+0xa7 ntkrnlpa.exe!MmAccessFault+0x11ae wcg_hcc1_img_5.15_windows_intelx86+0x12bea7 wcg_hcc1_img_5.15_windows_intelx86+0x19c5c ... ntkrnlpa.exe!KiDispatchInterrupt+0xa7 ntkrnlpa.exe!MmAccessFault+0x11ae wcg_hcc1_img_5.15_windows_intelx86+0x12bea7 wcg_hcc1_img_5.15_windows_intelx86+0x19c5c ... I suppose the devs (and Didactylos too, as he was profiling the app) know it already, but it was interesting for me to see all these (predicted) kernel calls, processing the memory access (issues?). Let's see (after finishing the Linux version, and if the devs wil get the time and "OK" to take a look) if we will sometimes find out, what to do against. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
...a page fault starts in the cache on the CPU chip. A new page has to be loaded from main memory into the chip cache. [...] HCC can generate so many page faults that the CPU is slowed down by the memory bus, which cannot keep the cache filled. Could this be why the 8 core clovers are taking such a huge hit? FSB speeds are 1400 maximum with the DDR2-667FBDimms while a lot of the quads at Xs run FSB speeds up to 1800 and just about all are at 1600 using DDR2-800 If the memory accesses (caused by the used algorithm) are "nicely" spread across the allocated memory, then, YES, it can easily overload any current CPU with FSB. But still, in my opinion, these core cache misses should not cause kernel interrupts, if the data is available in RAM. Except that some memory protections or I dont know what would cause additional kernel calls with each (few) memory block misses. [Edit 1 times, last edit by Former Member at Dec 18, 2007 4:01:12 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
...a page fault starts in the cache on the CPU chip. A new page has to be loaded from main memory into the chip cache. [...] HCC can generate so many page faults that the CPU is slowed down by the memory bus, which cannot keep the cache filled. Could this be why the 8 core clovers are taking such a huge hit? FSB speeds are 1400 maximum with the DDR2-667FBDimms while a lot of the quads at Xs run FSB speeds up to 1800 and just about all are at 1600 using DDR2-800 If the memory accesses (caused by the used algorithm) are "nicely" spread across the allocated memory, then, YES, it can easily overload any current CPU with FSB. But still, in my opinion, these core cache misses should not cause kernel interrupts, if the data is available in RAM. Except that some memory protections or I dont know what would cause additional kernel calls with each (few) memory block misses. This is exactly what I'm thinking. When Windows reports a page fault, as far as I'm aware it's a main memory page fault and has nothing to do with CPU cache. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello PepoS and Questar,
I think you are right about the vocabulary. I need to read up on this or I will just confuse people with my idiosyncratic usage. Right now I am running HCC on a single core Windows XP system, using 521 MB for the application and another 122 MB for OS and overhead. I am accessing the drive once every 3 to 5 seconds. But I am throwing up a large number of cache misses, which is slowing down my computing. I am spending a noticeable amount of time in the kernel, but I am not sure why. If I had multiple cores, I would expect a large amount of memory contention, but I am guessing at this point. Without the source, I am too lazy to try to really map out the performance. Lawrence |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I think Highwire has the best explanation so far. It agrees with what I've read, my test results, and what knreed has said. Also, he has some experimental results to back it up.
The only remaining question is whether it is practical to refactor the application to avoid this memory allocation pattern. Obviously, it can be done. But it may be trivial or it may be very, very, non-trivial, depending on the code. I expect WCG will do what they have in the past: improve the worst areas to the best of their ability in the time available. |
||
|
123bob
Cruncher Joined: May 1, 2007 Post Count: 42 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Some may already know this, but.....
I've been looking for a work around. I may have found one. The data below (I hope it formats right on the forum... ![]() I'm not sure if it's the Vista or the BOINC that stabilized this thing. Page fault counts went from billions to thousands with this move!! Look at how consistent the WUs run. I've made this move on four machines. All four show the same stabilization. Regards, Bob Win Server 2003 32 bit, BOINC 5.10.13 Result Name Device Name Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit X0000053701414200507181229_ 1-- BOBS-FARM-04 Valid 12/15/2007 11:47 12/17/2007 1:14 3.21 69.0 / 67.2 X0000053700504200507181245_ 0-- BOBS-FARM-04 Valid 12/15/2007 11:16 12/17/2007 1:14 3.1 66.5 / 73.7 X0000053700476200507181246_ 1-- BOBS-FARM-04 Valid 12/15/2007 11:15 12/17/2007 4:03 5.74 122.9 / 74.5 X0000053700736200507180920_ 1-- BOBS-FARM-04 Valid 12/15/2007 10:27 12/17/2007 1:14 5.3 113.9 / 69.0 ll117_ 00044_ 2-- BOBS-FARM-04 Valid 12/15/2007 10:26 12/16/2007 23:52 4.21 90.2 / 83.7 X0000053691317200508152322_ 1-- BOBS-FARM-04 Valid 12/15/2007 9:42 12/16/2007 23:52 5.17 110.8 / 57.8 ll116_ 00160_ 4-- BOBS-FARM-04 Valid 12/15/2007 9:19 12/16/2007 23:52 4.11 88.0 / 83.3 X0000053691167200507181207_ 1-- BOBS-FARM-04 Valid 12/15/2007 8:40 12/16/2007 23:52 5.03 107.9 / 73.3 X0000053691233200507180843_ 1-- BOBS-FARM-04 Valid 12/15/2007 7:35 12/16/2007 16:08 4.55 97.3 / 103.8 X0000053690140200507180901_ 1-- BOBS-FARM-04 Valid 12/15/2007 6:49 12/16/2007 23:52 6.11 131.0 / 131.0 X0000053341031200507130919_ 1-- BOBS-FARM-04 Valid 12/14/2007 18:07 12/16/2007 1:58 5.09 109.2 / 68.3 X0000053201106200507120844_ 0-- BOBS-FARM-04 Valid 12/14/2007 12:59 12/16/2007 1:58 3.46 74.2 / 77.4 X0000053200888200507120849_ 0-- BOBS-FARM-04 Valid 12/14/2007 12:49 12/16/2007 1:58 3.83 82.2 / 75.1 X0000053140693200507111427_ 0-- BOBS-FARM-04 Valid 12/14/2007 11:30 12/15/2007 22:25 5.13 110.4 / 78.3 X0000052980196200507080905_ 1-- BOBS-FARM-04 Valid 12/14/2007 9:31 12/15/2007 22:25 5.36 115.5 / 85.3 Vista Ultimate 64 bit, BOINC 5.10.28 Result Name Device Name Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit X0000055740864200508171137_ 1-- bobs-farm-04 Valid 12/19/2007 5:38 12/20/2007 16:29 2.69 68.1 / 76.8 X0000055740640200508171139_ 1-- bobs-farm-04 Valid 12/19/2007 5:29 12/20/2007 16:29 2.65 67.1 / 59.4 X0000055740408200508171144_ 1-- bobs-farm-04 Valid 12/19/2007 5:14 12/20/2007 16:29 2.7 68.4 / 66.3 X0000055520867200509022121_ 1-- bobs-farm-04 Valid 12/19/2007 3:01 12/20/2007 16:29 2.71 68.8 / 72.0 X0000055520791200509022122_ 0-- bobs-farm-04 Valid 12/19/2007 3:00 12/20/2007 16:29 2.65 67.2 / 60.5 X0000055521380200508191534_ 1-- bobs-farm-04 Valid 12/19/2007 1:13 12/20/2007 16:29 2.7 68.4 / 76.6 X0000055520127200508120832_ 0-- bobs-farm-04 Valid 12/18/2007 23:07 12/20/2007 8:20 2.74 69.4 / 78.8 X0000055520004200508120834_ 0-- bobs-farm-04 Valid 12/18/2007 23:05 12/20/2007 7:19 2.65 67.1 / 71.2 X0000055511496200509022044_ 0-- bobs-farm-04 Valid 12/18/2007 23:04 12/20/2007 7:19 2.66 67.3 / 67.4 X0000055511365200509022047_ 1-- bobs-farm-04 Valid 12/18/2007 23:02 12/20/2007 6:02 2.77 68.7 / 68.7 X0000055511363200509022047_ 0-- bobs-farm-04 Valid 12/18/2007 23:02 12/20/2007 6:02 2.67 66.1 / 68.3 X0000055510854200508262138_ 0-- bobs-farm-04 Valid 12/18/2007 18:52 12/20/2007 2:39 2.66 65.9 / 66.0 X0000055510719200508262140_ 1-- bobs-farm-04 Valid 12/18/2007 18:42 12/20/2007 1:08 2.7 66.8 / 74.1 X0000055510644200508262141_ 1-- bobs-farm-04 Valid 12/18/2007 18:40 12/20/2007 0:51 2.61 64.6 / 64.5 |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Thanks 123bob,
This shows that the problem is a solvable one. I don't have it (much) on my one-core machine, so I have been just guessing why it hits other people so hard. It looks as though it wil be some sort of problem such as Highwire suggested. The techs are aware of this problem (and several others) but I won't start nagging them until after New Year's Day. ![]() Lawrence |
||
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1677 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hello Everybody,
----------------------------------------Christmas time is arrived for all of us, including the techs who support the grid computing projects around the clock and around the year; especially to them, I wish a great Christmas time ! ![]() Because, I was very busy for business reasons (incl. traveling) during the last weeks, I was only able to follow the thread development respectively the results of the various investigations from time to time. Personally, I had the feeling that the community made a lot of progress, even if the root cause is still not properly identified. I hope that everybody - especially people devoting several hosts to the grid computing projects - will monitor their host performance more accurately than in the past. Because of this "dramatic" performance problem, I reallocated the various WCG projects with a little bit more understanding. Indeed, I reach today a similar performance than two months ago although I had to retire one of my best host (T7200, 2 GHz, 2 GB RAM) ! In order to achieve the different projects within a reasonable time and with a reasonable energy consumption (environmental protection should be also an issue), we have to become better by managing the computation resources. Such performance review or monitoring must be accurately performed by the introduction of new projects. I don't know if some crunchers should be volunteer for reporting any "side effects" during the first weeks or months of a project ! ... Additionally to WCG and BOINCstats reports, I think that, at least, some members have to cross compare the platform/host performance against projects in order to identify critical configurations and to optimize computation efficiency by providing recommendations. Today, this is my "current thinking". I would enjoy if some of you could submit some ideas for becoming better and for working in a more efficient way. I would like to share ideas about it and to collaborate for elaborating utilities or tools helping to cross-compare platform and project and to monitor performance in a time efficient manner. Maybe, such discussions should occur on a separate thread because it impacts every project and not HCC only ! Again, merry Christmas for everybody ! Cheers, |
||
|
|
![]() |