Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Locked Total posts in this thread: 210
|
![]() |
Author |
|
123bob
Cruncher Joined: May 1, 2007 Post Count: 42 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Not sure I'm posting right...I don't post much here....
I'm not replying to MM's post, merely trying to tag more info to the topic...(Even though MM is a teammate, and I support his views....) I crunch a "little" for WCG, #80 ranked...just waiting for skulltrail...or maybe Harpers??? I have been trying to t-shoot some of the HCC underscoring issues and can report the following observations, in the hope that it may provide some clues to the scientists on the project. I'm scratching my head with my results, but bare with me.... My 2 Vista ultimate 64 bit machines are totally clean in the reported vs claimed. They take on the order of 2-3 hours to crunch a HCC unit. Page fault count in perf mgr looks normal compared to other projects. (one of these machines is heavily overclocked. The other is stock.) A win server 2003 32bit machine, overclocked, is severely affected. It returns about 25-40% ""underscored units. The real problem is the page faults number into the billions....I returned this machine to stock clocks and it only helped a little. I still get "underscores" on it. I had many other machines that fall in this category. My main rig, "el-machino-2" was almost unusable for other tasks due to the page fault problem. This is what led me into the problem.... I have a single quad, also a member of the "123clan" that is also perfect. It is my Daughter's quad, "123sarah" name of "el-machino-1". It is stock, running XP pro 32 bit, and so far runs right......???? There is no doubt that there is a problem, and I offer to help find it in any way I can. Let me know what tests I can do. The "123clan" runs 12 quads, one old faithful P4 northy, and a T7200 Lappy duo. The message here is we are not slackers, but will also not sit still when we KNOW a problem exists. We owe that to the science here... You folks that are in charge, can look into my account and gather info. "123bob". Let me know if I can provide more, or if you want me to change configs to help test this situation. What I do find unacceptable is the notion that excess page faults are OK.... Sorry, they are not, since the substantial investment I have made, thus far, in CPU power is not being utilized to it's greatest efficiency... Cancer is my #1 enemy...... Regards, Bob |
||
|
twilyth
Master Cruncher US Joined: Mar 30, 2007 Post Count: 2130 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Great post 123bob. Thanks for taking the time out to research the problem. A lot more HCC wu's could get done a lot sooner if WCG could at least tell us what is causing the page faults specific to HCC - since I personally have no such problem on DDDT, FAAH or HPF. Then we could probably come up with a workaround until the cause could be addressed.
----------------------------------------![]() ![]() |
||
|
123bob
Cruncher Joined: May 1, 2007 Post Count: 42 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks Twilyth. I agree that many more WUs could be completed if this issue was taken care of. I am only running 5 quads due to this problem. The entire stable of 12 quads plus, would be put on this task, if it were not for this issue, and the yorkies, and future Nehalem 8 core farm.......... HELLO? Anyone listening?
And yes, I realize that the linux users and mac users would like to be brought on-board......I was and can be one of them again, at any time, when you show me those issues are fixed...... Bob |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi 123bob,
knreed is looking into this. For the moment the best workaround is to put badly affected systems onto other projects. Lawrence |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
123bob, nobody is saying excess page faults are okay. However, they may be unavoidable.
We will know more when the techs have time to study the problem. |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
One observation: Of all the systems here, the service install on the Vista HP (32bit) shows by far the least PF/PF Delta, so wonder if it's just plain better at certain aspects of managing the flows. Yesterday 3.89 hours as a record for the Vista-Quad at stock.
----------------------------------------
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Dec 13, 2007 8:37:28 AM] |
||
|
123bob
Cruncher Joined: May 1, 2007 Post Count: 42 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Good to know it is being looked at. I guess I'm just a little frustrated with this problem cause cancer is why I crunch. This sure is a weird problem to try to troubleshoot. The data I get conflicts in some cases.... I've already done as you guys suggest and have put my worst machines on other tasks.
I'll be standing by to hear what is found out. Regards, Bob |
||
|
Highwire
Cruncher Joined: Aug 18, 2006 Post Count: 39 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I've been doing a bit of thinking about this. The process is spending over 10% of its life in the kernel on my single core machines. If I could humbly offer up a couple bits of advice for the developers, as I'm a Windows developer myself who deals with such things, and do my bit for the cause (!). You may all know this and I really don't want to offend anyone, but I feel I have to mention it. Many experienced developers don't know these things, could you pass this on to the developers?
1. If you are calling anything that is opening up shared memory in anything remotely like a tight loop, don't, or be careful. I've worked on software that used shared memory like that, and it page faulted for 'no' reason, much like this. It looked innocent but that was the cause, once debugged. My spider sense is tingling .. :) You might comment out a line or two and get a eureka moment. I'd only use shared memory if I need to speak to another *separate* application once in a while, I'd be warey of using it for any sort of internal use, it's just not required. I've got these kernel time issues in single core machines. If you are, and this is a guess of course, using shared memory in a multi core machine, I'm wondering if this would lead to a lot of inefficiencies copying / locking data across cores..? I'm wondering if you are -accidentally- using ~ shared memory by using DLL data segments or something. It's a thought. 2. If you are using mutexes or other such 'system object' calls to sync threads or lock resources that don't go *outside* the HCC process, look at critical sections or some other mechanism that DOES NOT make the kernel switch. A mutex is much slower, as it goes into the kernel. Stick a mutex lock / unlock in a loop and time it, then do the same with a CS. You might be shocked! ![]() I'm intrigued from my own technical viewpoint as to why a piece of sofware running 'normal' code would be doing funny kernel stuff like this, I can't see why it should. Maybe the above is wrong but this is all a bit wierd - trying to help if I can :) As I say trying not to tread on toes but I do a lot of this stuff so thought I'd chuck in my tuppence - it's floating round my head and needed said ![]() |
||
|
twilyth
Master Cruncher US Joined: Mar 30, 2007 Post Count: 2130 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I only understood some of what you've said, but want to thank you for your input.
----------------------------------------I believe LH said that boinc and the science apps communicate via memory pointers - I guess that would imply shared memory - yes - no? For non-developers, what is the significance of high kernel time? Would that imply above average memory access and therefore a greater chance of page faults? ![]() ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi twilyth & Highwire,
Yes, boinc.exe and the application communicate using shared memory, which has caused some [other] problems before. I consider Highwire's post to be very interesting, so I will point it out to knreed. Thanks, Lawrence |
||
|
|
![]() |