Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Locked
Total posts in this thread: 210
Posts: 210   Pages: 21   [ Previous Page | 3 4 5 6 7 8 9 10 11 12 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 20161 times and has 209 replies Next Thread
123bob
Cruncher
Joined: May 1, 2007
Post Count: 42
Status: Offline
Project Badges:
Re: Some concerns regarding the granted points

Not sure I'm posting right...I don't post much here....
I'm not replying to MM's post, merely trying to tag more info to the topic...(Even though MM is a teammate, and I support his views....)

I crunch a "little" for WCG, #80 ranked...just waiting for skulltrail...or maybe Harpers???

I have been trying to t-shoot some of the HCC underscoring issues and can report the following observations, in the hope that it may provide some clues to the scientists on the project. I'm scratching my head with my results, but bare with me....

My 2 Vista ultimate 64 bit machines are totally clean in the reported vs claimed. They take on the order of 2-3 hours to crunch a HCC unit. Page fault count in perf mgr looks normal compared to other projects. (one of these machines is heavily overclocked. The other is stock.)

A win server 2003 32bit machine, overclocked, is severely affected. It returns about 25-40% ""underscored units. The real problem is the page faults number into the billions....I returned this machine to stock clocks and it only helped a little. I still get "underscores" on it. I had many other machines that fall in this category. My main rig, "el-machino-2" was almost unusable for other tasks due to the page fault problem. This is what led me into the problem....

I have a single quad, also a member of the "123clan" that is also perfect. It is my Daughter's quad, "123sarah" name of "el-machino-1". It is stock, running XP pro 32 bit, and so far runs right......????

There is no doubt that there is a problem, and I offer to help find it in any way I can. Let me know what tests I can do. The "123clan" runs 12 quads, one old faithful P4 northy, and a T7200 Lappy duo. The message here is we are not slackers, but will also not sit still when we KNOW a problem exists. We owe that to the science here...

You folks that are in charge, can look into my account and gather info. "123bob". Let me know if I can provide more, or if you want me to change configs to help test this situation.

What I do find unacceptable is the notion that excess page faults are OK.... Sorry, they are not, since the substantial investment I have made, thus far, in CPU power is not being utilized to it's greatest efficiency...

Cancer is my #1 enemy......

Regards,
Bob
[Dec 13, 2007 6:42:20 AM]   Link   Report threatening or abusive post: please login first  Go to top 
twilyth
Master Cruncher
US
Joined: Mar 30, 2007
Post Count: 2130
Status: Offline
Project Badges:
Re: Some concerns regarding the granted points

Great post 123bob. Thanks for taking the time out to research the problem. A lot more HCC wu's could get done a lot sooner if WCG could at least tell us what is causing the page faults specific to HCC - since I personally have no such problem on DDDT, FAAH or HPF. Then we could probably come up with a workaround until the cause could be addressed.
----------------------------------------


[Dec 13, 2007 6:57:35 AM]   Link   Report threatening or abusive post: please login first  Go to top 
123bob
Cruncher
Joined: May 1, 2007
Post Count: 42
Status: Offline
Project Badges:
Re: Some concerns regarding the granted points

Thanks Twilyth. I agree that many more WUs could be completed if this issue was taken care of. I am only running 5 quads due to this problem. The entire stable of 12 quads plus, would be put on this task, if it were not for this issue, and the yorkies, and future Nehalem 8 core farm.......... HELLO? Anyone listening?

And yes, I realize that the linux users and mac users would like to be brought on-board......I was and can be one of them again, at any time, when you show me those issues are fixed......

Bob
[Dec 13, 2007 7:17:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Re: Some concerns regarding the granted points

Hi 123bob,
knreed is looking into this. For the moment the best workaround is to put badly affected systems onto other projects.

Lawrence
[Dec 13, 2007 7:45:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Re: Some concerns regarding the granted points

123bob, nobody is saying excess page faults are okay. However, they may be unavoidable.

We will know more when the techs have time to study the problem.
[Dec 13, 2007 8:05:38 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Re: Some concerns regarding the granted points

One observation: Of all the systems here, the service install on the Vista HP (32bit) shows by far the least PF/PF Delta, so wonder if it's just plain better at certain aspects of managing the flows. Yesterday 3.89 hours as a record for the Vista-Quad at stock.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Dec 13, 2007 8:37:28 AM]
[Dec 13, 2007 8:36:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
123bob
Cruncher
Joined: May 1, 2007
Post Count: 42
Status: Offline
Project Badges:
Re: Some concerns regarding the granted points

Good to know it is being looked at. I guess I'm just a little frustrated with this problem cause cancer is why I crunch. This sure is a weird problem to try to troubleshoot. The data I get conflicts in some cases.... I've already done as you guys suggest and have put my worst machines on other tasks.
I'll be standing by to hear what is found out.

Regards,
Bob
[Dec 13, 2007 3:24:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Highwire
Cruncher
Joined: Aug 18, 2006
Post Count: 39
Status: Offline
Project Badges:
Re: [RENAMED] Some concerns regarding the HCC project (page fault and poor performance, in particular (but not only) by multi-core hosts; #cores>2)

I've been doing a bit of thinking about this. The process is spending over 10% of its life in the kernel on my single core machines. If I could humbly offer up a couple bits of advice for the developers, as I'm a Windows developer myself who deals with such things, and do my bit for the cause (!). You may all know this and I really don't want to offend anyone, but I feel I have to mention it. Many experienced developers don't know these things, could you pass this on to the developers?

1. If you are calling anything that is opening up shared memory in anything remotely like a tight loop, don't, or be careful. I've worked on software that used shared memory like that, and it page faulted for 'no' reason, much like this. It looked innocent but that was the cause, once debugged. My spider sense is tingling .. :) You might comment out a line or two and get a eureka moment. I'd only use shared memory if I need to speak to another *separate* application once in a while, I'd be warey of using it for any sort of internal use, it's just not required.

I've got these kernel time issues in single core machines. If you are, and this is a guess of course, using shared memory in a multi core machine, I'm wondering if this would lead to a lot of inefficiencies copying / locking data across cores..? I'm wondering if you are -accidentally- using ~ shared memory by using DLL data segments or something. It's a thought.

2. If you are using mutexes or other such 'system object' calls to sync threads or lock resources that don't go *outside* the HCC process, look at critical sections or some other mechanism that DOES NOT make the kernel switch. A mutex is much slower, as it goes into the kernel. Stick a mutex lock / unlock in a loop and time it, then do the same with a CS. You might be shocked! tongue

I'm intrigued from my own technical viewpoint as to why a piece of sofware running 'normal' code would be doing funny kernel stuff like this, I can't see why it should. Maybe the above is wrong but this is all a bit wierd - trying to help if I can :)

As I say trying not to tread on toes but I do a lot of this stuff so thought I'd chuck in my tuppence - it's floating round my head and needed said love struck
[Dec 13, 2007 5:24:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
twilyth
Master Cruncher
US
Joined: Mar 30, 2007
Post Count: 2130
Status: Offline
Project Badges:
Re: [RENAMED] Some concerns regarding the HCC project (page fault and poor performance, in particular (but not only) by multi-core hosts; #cores>2)

I only understood some of what you've said, but want to thank you for your input.

I believe LH said that boinc and the science apps communicate via memory pointers - I guess that would imply shared memory - yes - no?

For non-developers, what is the significance of high kernel time? Would that imply above average memory access and therefore a greater chance of page faults?
----------------------------------------


[Dec 13, 2007 5:49:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Re: [RENAMED] Some concerns regarding the HCC project (page fault and poor performance, in particular (but not only) by multi-core hosts; #cores>2)

Hi twilyth & Highwire,
Yes, boinc.exe and the application communicate using shared memory, which has caused some [other] problems before. I consider Highwire's post to be very interesting, so I will point it out to knreed.

Thanks,
Lawrence
[Dec 14, 2007 1:07:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 210   Pages: 21   [ Previous Page | 3 4 5 6 7 8 9 10 11 12 | Next Page ]
[ Jump to Last Post ]
Post new Thread