Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Locked Total posts in this thread: 210
|
![]() |
Author |
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1679 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thank you for the advice !
----------------------------------------I had to turn off the privacy configuration of my firewall for seeing "pencil&paper" (cookie related rules). Does the thread title now better reflect the subject of this discussion ? |
||
|
Movieman
Veteran Cruncher Joined: Sep 9, 2006 Post Count: 1042 Status: Offline |
Unfortunately your link has restricted rights to view so we are unable to see, however if you knew about this on Nov 2, then why did you make an initial reply to this thread saying there isn't a problem, when there clearly is ![]() I didn't link the credit complaints to the pagefault issue immediately. The pagefault issue had already been reported in private, and also discussed in another public thread at length. It was hard to miss. I said it wasn't a problem because the credit system was, in fact, working perfectly. The initial anomalies were well within the normal variance for unit runtimes. I'm afraid people have cried "wolf" so frequently over credit claims, that I wait until there is reliable evidence before taking them seriously. So... a timeline: I spotted it on 2nd November. brent1023 spotted it on the 14th of November. BuHHunyx on the 23rd. Movieman was complaining of an unknown HCC problem on the 26th, but it was Sekerob who linked the issues, also on the 26th. So now you know. We all like credits. Please don't infer inference when no inference exists.... The link I gave was to a private forum. So, I dug through the archives and found this post on the 3rd: http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=16902#135613 - I explained my methodology, my observations, and that I had shunted it over to the techs. Credit also to tekennelly who first discovered the pagefault issue - not in HCC, but in HPF2. The problem is less pronounced there. Yes, I posted here on the 26th. I wrote WCG directly on it on the 13th and received a reply back from "kevin" on the 15th. I did that intentionally so as to bring this to peoples attention quietly and not get into a credit discussion as they generally go downhill fast. We'd been discussing this issue at Xs for at least a week prior to the 15th before I decided to write WCG. I wanted to make sure I wasn't the only one seeing this. We have app 30-8 core clover machines at Xs that generally do app 700,000-750,000 points per day so you can understand why this was a major concern to us. Anyway, I just wanted to clear that up. Lets get back to work and good luck to the techs working on the Linux issue. ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I started, crunching years ago, with UD - Phase I, hoping to "fight Cancer".
I was really looking forward to another Cancer project here with WCG. On the other hand, I hate having my machines waste their time, taking way to long on HCC units. I have two dual Clovertown rigs and it was sad to see how long an HCC WU took on them. So until I hear that the HCC issues are over and it is "all clear", by more than a few people, I'm done with them. Back to FAAH and Dengue, for me; I changed my Project profile and excluded HCC. |
||
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hello Yves!
----------------------------------------Does the thread title now better reflect the subject of this discussion ? It is certainly far better than the initial one and probably closer to the truth, but the reference to multi-core PCs might put the techs on a false track. As reported in my post JmBoullier - Nov 29, 2007 12:46:40 AM this problem is not limited to multi-core machines as far as I can observe. And to make sure my HT processor was not behaving like a multi-core machine I have also ensured that the HCC WU was alone doing something meaningful when I watched. Don't feel obliged to change your thread title again as it seems that the problem is more visible on multi-core machines but I wanted to make sure the techs will know it is not limited to them when they can move to this problem. Cheers. Jean. |
||
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1679 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi Jean,
----------------------------------------you're right, I will try to find a better formulation in order to not induce people on the false way. I hope everything is going well by you, since news from Decrypthon team are rare currently ... Have a nice week-end, |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I think to clarify things further, you should edit the title one more time, as it says you have "Some concerns regarding HCC" within it.
You should make it clear that this has nothing whatsoever to do with Hamilton Caving Club (NZ), or Michigans Hubcap Collecters Club, but especially has nothing to do with meat promotion or production in Wales. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I suppose I'll dip my toe into the water again. Brrr. . . it's chilly!
A page fault occurs when the core wants to access memory that is not loaded into cache. This will slow things down because the kernel will have to load a new page from memory into the cache, while the core waits. Any application with a lot of page faults will run more slowly than one with only a few. But there is a second possible problem with performance. Multiple cores can 'queue up' a series of page faults so that each core has to wait until its own page fault gets serviced. This is called memory contention. If a number of cores are running applications with a high number of page faults, then performance will drop even more because of this memory contention. How can this performance inefficiency be cured? The normal way is to run a preprocessing step over the data arrays and produce a new array that clusters data together the way that the program will access it. Sometimes this is possible. Unfortunately, sometimes it is not. It all depends on the algorithm. Even when it is possible, it produces unreadable data structures. This need not be a problem but when developing a new program that has to be rapidly changed to match research needs it is almost always a problem. [A personal reminiscence. A generation ago I spotted a neat 15-25 line section in an image processing assembler routine that I could optimize to speed up the program by 10%-15%. Even with paperwork, this change only cost me 2 or 3 days and we were running it constantly on a number of computers, so I considered it time well spent. I actually congratulated myself about this. (sob..) A little more than 2 years later the new computers changed the cache organization and I suddenly realized that my change was bound to cause problems down the road if the cache changed even more. After thinking it over for an hour, I eliminated the change. Programming to meet specific cache designs is very dangerous practice that has to be considered very suspiciously.] So what is my estimate of the situation? I don't think that it makes sense to reprogram the application for this. The project scientists should be concentrating on the results and overworking the programmers to change the application to produce better results. Faster should be ignored at this stage. But how should individual members of the World Community Grid feel about this? The high page-fault count is simply an artifact of the algorithm. It will slow down the flops/second but that will not matter as such. The CPU time spent running the kernel to load in new pages will show up as reduced credit, but for a single core the points impact should not be substantial. Memory contention will be much more substantial, so 4 and 8 core machines would show a much greater drop in points if running more than 1 HCC work unit. The WCG scheduler is sending out the HCC work units so a member can eliminate HCC from these multi-core machines without slowing down progress on HCC. And they could then run other projects such as FAAH and DDT that would otherwise have to run on the single core computers that can handle HCC with the greatest efficiency. An unrelated note. Some days ago someone posted a work unit awarded 8.3 points in this thread. This was immediately reported to the WCG staff. I don't know what went wrong and we have a number of more urgent issues, but it is an error unrelated to the main problem being addressed in this thread. Lawrence |
||
|
twilyth
Master Cruncher US Joined: Mar 30, 2007 Post Count: 2130 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks LH - good enough for me.
----------------------------------------I just turned off HCC for my profile and I'll share your post with our team. ![]() ![]() |
||
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1679 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Many thanks Lawrence for these clear explanations.
----------------------------------------Indeed after the problem has been identified, I think that many of us have deselected the HCC project for their multicore-machines. However this particular behavior is curious especially because it not occurs with this dimension for other projects. I agree with you regarding the danger caused by particular software optimizations which would have been initiated for fitting specific hardware constraints. I wish you large success by further supporting WCG. Cheers, |
||
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Lawrence,
----------------------------------------Thank you for your explanations, and I agree with you that designing programs to fit particular hardware or system internal design is generally not a good choice, specially in a multiplatform environment. However there are program design choices which are inefficient in all environments and I think that when there are less emergencies to deal with it will be good that a senior programmer makes sure that there is not such a problem somewhere in the program, particularly in parts of the code which are at the heart of the many loops that such programs probably contain. This is not saying that the programmers of HCC are not good, it is only saying that everybody can miss something sometime. Cheers. Jean. |
||
|
|
![]() |