Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Locked Total posts in this thread: 210
|
![]() |
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi zombie67,
problem with the application Say rather that the cache is under-utilized so that we are measuring the speed of accesses to main memory. More cores means more contention between them for the memory bus. It looks as though it is a bad idea to have more than 1 HCC work unit running on the same memory bus. Lawrence |
||
|
zombie67 [MM]
Senior Cruncher USA Joined: May 26, 2006 Post Count: 228 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi zombie67, problem with the application Say rather that the cache is under-utilized so that we are measuring the speed of accesses to main memory. More cores means more contention between them for the memory bus. It looks as though it is a bad idea to have more than 1 HCC work unit running on the same memory bus. Lawrence Interesting. BURP has a similar problem. To address it, they limited the number of tasks sent to a machine at any given time. Perhaps something similar could be done with HCC? Limit it to (say) one at any given time? The rest of the cores could then be occupied by other sub-projects. ![]() [Edit 1 times, last edit by zombie67 at Nov 27, 2007 8:11:17 PM] |
||
|
BobCat13
Senior Cruncher Joined: Oct 29, 2005 Post Count: 295 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Interesting. BURP has a similar problem. To address it, they limited the number of tasks sent to a machine at any given time. Perhaps something similar could be done with HCC? Limit it to (say) one at any given time? The rest of the cores could then be occupied by other sub-projects. I don't think that will help. Over the weekend on my AMD X2 6000+, my WCG queue was HCC only. I tried running it with Sudoku, SIMAP, and Spinhenge one project at a time so WCG was only using one core. The excessive page faults were still happening no matter which project it was teamed with. I saw several HCC tasks go over 1 billion PFs before completing. That amount of PFs caused a 20-25% increase in crunching time. Other HCC tasks totaled less than 50 million PFs to completion and finished over an hour faster than the large PFs tasks. My first thought was the L2 cache on Core2 is much larger than the AMD cpus, so maybe some array fits into the Core2 L2 but not in the AMD L2. But that wouldn't explain the Xeon problems. |
||
|
Movieman
Veteran Cruncher Joined: Sep 9, 2006 Post Count: 1042 Status: Offline |
Hi Movieman, Think that what is discussed in the Page Fault thread is the cause. Not sure, but think if you stick to just doing HCC exclusively the averages will work their way up. A very steady benchmark is what promotes how the algorithm handles claims and quorum. Regarding your line "It's just not logical that a machine that's used for many things will bench more consistent than one that is limited to a very defined area of work.". I know what you are saying, consistently too low (see next para)! I know from observation that the timing of that can cause huge sways in the values, particular discwriting, so one night got up and forced one.... its now always running at night on the 24-7 machine (Have to get up again after an upgrade ![]() ![]() And this from the very secret Italian room.... set the BOINC.exe to the second highest priority. The program does very little until it does that once-per 5 day test and will run it with elevated attention. I repeat myself: I think it BS that the bench is impacted by runtime events. The credits are computed on CPU seconds and not wall-clock... think that suites your thinking or ![]() End of the off-topic Hi Sek: It's frustrating. I'm seeing my top machine here that is running HCC take a 6000 point a day drop from 25K+ to 19K..Just not right and I've just pulled it from the HCC project and put it back on strictly FAAH and will ask that others with these 8 core machines do the same. I'm not going to sit back calmly and have the machine put in a claim for 140 points on a 7 hour WU and then be awarded 1/2 of that.. Sorry, but thats pure BS in my opinion. Look for yorself. This is on a 8 core clover running right now at 3083mhz and doing nothing but WCG: Result Name Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit X0000039140823200410152303_ 0-- Valid 11/25/2007 07:48:45 11/25/2007 22:33:35 3.62 57.5 / 57.5 X0000039140823200410152303_ 1-- Valid 11/25/2007 07:48:40 11/28/2007 05:09:34 7.87 155.5 / 57.5 <-me X0000039140873200409241129_ 0-- Valid 11/25/2007 06:49:24 11/26/2007 04:51:10 6.49 71.7 / 71.7 X0000039140873200409241129_ 1-- Valid 11/25/2007 06:48:25 11/28/2007 05:09:34 7.20 142.2 / 71.7 <-me X0000039140498200409241136_ 0-- Valid 11/25/2007 06:33:32 11/25/2007 15:35:07 5.36 68.4 / 68.4 X0000039140498200409241136_ 1-- Valid 11/25/2007 06:31:33 11/28/2007 05:09:34 6.91 136.6 / 68.4<-me X0000055211331200508301311_ 2-- Valid 11/25/2007 11:08:26 11/26/2007 03:34:35 4.20 64.5 / 64.5 X0000055211331200508301311_ 0-- Error 11/25/2007 04:58:47 11/25/2007 11:06:47 0.00 0.0 / 0.0 <-error X0000055211331200508301311_ 1-- Valid 11/25/2007 04:58:11 11/27/2007 21:05:59 7.36 106.0 / 64.5<-me X0000055071040200508232324_ 1-- Valid 11/25/2007 03:50:04 11/27/2007 21:05:59 8.28 119.4 / 61.9<-me X0000055071040200508232324_ 0-- Valid 11/25/2007 03:49:13 11/25/2007 15:34:09 3.69 61.9 / 61.9 X0000055070181200508232338_ 0-- Valid 11/25/2007 03:23:12 11/25/2007 19:49:47 4.57 59.6 / 59.6 X0000055070181200508232338_ 1-- Valid 11/25/2007 03:22:04 11/27/2007 19:52:20 7.87 128.0 / 59.6<-me Im looking into the future and wondering what will happen when we start seeing dual nehalems late in 2008 with 16 cores that also use an advanced form of hyperthreading and will run 32 WU at a time. I think the answer for me is to just sit this one out and continue working on the AIDS project even though I'd prefer to work on the Cancer project. Edit: Some more info for you: Here's the last 7 days on the 2 clover machines. The first is for my "slower" machine thats at 3000mhz right now. Uses Win2K3 32 and boimc 5.10.20 32 bit This machine has been doing nothing but FAAH WU: Statistics Date Total Run Time (y:d:h:m:s) Points Generated Results Returned 11/27/2007 0:007:16:02:17 23,485 45 11/26/2007 0:007:12:42:34 23,538 46 11/25/2007 0:007:02:35:18 21,319 54 11/24/2007 0:008:09:49:36 25,462 47 11/23/2007 0:008:08:36:47 25,148 47 11/22/2007 0:007:11:36:32 22,753 40 11/21/2007 0:008:15:15:18 26,113 46 Now the faster machine with win2K3 64 bit and Boinc 5.10.20 64 bit This machine does a mix of FAAH and HCC, and runs at 83mhz faster. Statistics Date Total Run Time (y:d:h:m:s) Points Generated Results Returned 11/27/2007 0:007:18:00:35 15,964 28 11/26/2007 0:008:19:19:09 19,732 36 11/25/2007 0:007:21:44:10 19,604 39 11/24/2007 0:009:05:55:49 22,313 40 11/23/2007 0:006:17:06:08 20,510 42 11/22/2007 0:007:09:42:06 22,058 45 11/21/2007 0:008:01:48:19 24,123 46 To add to the numbers for you last March we did an analysis of this last machine. For the month of March it had a 99.38% uptime and averaged 25,238 points a day doing strictly FAAH WU. These type of numbers are hard to interprete but do show trends. The trend I was seeing told me it was time to cut loose of HCC till it is "fixed".. ![]() [Edit 3 times, last edit by Movieman at Nov 28, 2007 6:03:50 AM] |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Can you check the HCC run times for the mix machine. Some wrote that 1 or 2 HCC concurrently do not generate the high PF and Delta. The runtime/claims and Credits might be better.
----------------------------------------The C2D went berserk yesterday. 1.3 billion PFs and 200-225 thousand Delta almost constantly over the first half, but subsiding towards the later half. Points were fine as the quorum partner also had a hi run time. The Q6600 continues soundly with 4 HCC producing zero delta. Don't feel bad about this. If a machine is not able to perform efficiently on a particular task, make the hop. My P4 is highly inefficient on HPF2 (yeah i know), so it's not doing those and the C2D might come off the HCC if this keeps repeating (was the first time i had this on that box). In any case is been highlighted in more than a few places, so looking forward to a solution or at least a recommendation to find the right machines for this. Hope the Linux bug finding sessions bring something to surface at the same time. If that problem is solved the productivity will substantially increase.
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Nov 28, 2007 10:58:05 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Here is an interesting result for this conversation:
----------------------------------------Workunit Status Project Name: Help Conquer Cancer Created: 11/24/2007 14:18:27 Name: X0000052451057200507211308 Minimum Quorum: 2 Initial Replication: 2 Result Name Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit X0000052451057200507211308_ 1-- Valid 11/24/2007 20:43:34 11/25/2007 05:25:34 5.39 8.3 / 8.3 X0000052451057200507211308_ 0-- Valid 11/24/2007 20:42:28 11/28/2007 14:34:36 7.49 85.4 / 8.3 I normally get just over 10 per hour of work and this one was just over 1 point per hour of work. Just thought it was strange enough to point out. [Edit 1 times, last edit by Former Member at Nov 28, 2007 3:02:06 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
here is a listing of all of my valid results on 2 computers:
Result Name Device Name Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit X0000052451094200507211307_ 0-- Laptop Valid 11/24/2007 20:44:00 11/28/2007 14:34:46 5.48 56.6 / 56.6 X0000052451060200507211308_ 0-- Laptop Valid 11/24/2007 20:44:00 11/28/2007 14:34:46 5.66 58.5 / 59.2 X0000052451059200507211308_ 0-- Laptop Valid 11/24/2007 20:44:00 11/28/2007 14:34:46 5.78 59.7 / 59.7 X0000052451085200507211308_ 1-- Laptop Valid 11/24/2007 20:43:59 11/28/2007 14:34:46 5.60 57.9 / 57.7 X0000052451057200507211308_ 0-- Dan-PC Valid 11/24/2007 20:42:28 11/28/2007 14:34:36 7.49 85.4 / 8.3 X0000052451031200507211308_ 0-- Dan-PC Valid 11/24/2007 20:42:28 11/28/2007 14:34:36 8.16 93.1 / 99.6 X0000052451029200507211308_ 0-- Dan-PC Valid 11/24/2007 20:42:28 11/28/2007 03:50:56 7.63 86.8 / 90.9 X0000052451028200507211308_ 0-- Dan-PC Valid 11/24/2007 20:42:28 11/27/2007 23:48:04 7.55 85.4 / 85.4 X0000052451027200507211308_ 0-- Dan-PC Valid 11/24/2007 20:42:28 11/27/2007 19:15:10 8.60 97.9 / 87.0 X0000052450804200507211312_ 1-- Laptop Valid 11/24/2007 20:33:31 11/27/2007 19:24:44 5.77 59.4 / 59.4 X0000052120670200507262053_ 0-- Laptop Valid 11/24/2007 19:40:46 11/27/2007 17:34:28 5.47 56.3 / 58.9 |
||
|
Movieman
Veteran Cruncher Joined: Sep 9, 2006 Post Count: 1042 Status: Offline |
Can you check the HCC run times for the mix machine. Some wrote that 1 or 2 HCC concurrently do not generate the high PF and Delta. The runtime/claims and Credits might be better. The C2D went berserk yesterday. 1.3 billion PFs and 200-225 thousand Delta almost constantly over the first half, but subsiding towards the later half. Points were fine as the quorum partner also had a hi run time. The Q6600 continues soundly with 4 HCC producing zero delta. Don't feel bad about this. If a machine is not able to perform efficiently on a particular task, make the hop. My P4 is highly inefficient on HPF2 (yeah i know), so it's not doing those and the C2D might come off the HCC if this keeps repeating (was the first time i had this on that box). In any case is been highlighted in more than a few places, so looking forward to a solution or at least a recommendation to find the right machines for this. Hope the Linux bug finding sessions bring something to surface at the same time. If that problem is solved the productivity will substantially increase. Runtime is in red: Result Name Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit X0000039140823200410152303_ 0-- Valid 11/25/2007 07:48:45 11/25/2007 22:33:35 3.62 57.5 / 57.5 X0000039140823200410152303_ 1-- Valid 11/25/2007 07:48:40 11/28/2007 05:09:34 7.87 155.5 / 57.5 <-me X0000039140873200409241129_ 0-- Valid 11/25/2007 06:49:24 11/26/2007 04:51:10 6.49 71.7 / 71.7 X0000039140873200409241129_ 1-- Valid 11/25/2007 06:48:25 11/28/2007 05:09:34 7.20 142.2 / 71.7 <-me X0000039140498200409241136_ 0-- Valid 11/25/2007 06:33:32 11/25/2007 15:35:07 5.36 68.4 / 68.4 X0000039140498200409241136_ 1-- Valid 11/25/2007 06:31:33 11/28/2007 05:09:34 6.91 136.6 / 68.4<-me X0000055211331200508301311_ 2-- Valid 11/25/2007 11:08:26 11/26/2007 03:34:35 4.20 64.5 / 64.5 X0000055211331200508301311_ 0-- Error 11/25/2007 04:58:47 11/25/2007 11:06:47 0.00 0.0 / 0.0 <-error X0000055211331200508301311_ 1-- Valid 11/25/2007 04:58:11 11/27/2007 21:05:59 7.36 106.0 / 64.5<-me X0000055071040200508232324_ 1-- Valid 11/25/2007 03:50:04 11/27/2007 21:05:59 8.28 119.4 / 61.9<-me X0000055071040200508232324_ 0-- Valid 11/25/2007 03:49:13 11/25/2007 15:34:09 3.69 61.9 / 61.9 X0000055070181200508232338_ 0-- Valid 11/25/2007 03:23:12 11/25/2007 19:49:47 4.57 59.6 / 59.6 X0000055070181200508232338_ 1-- Valid 11/25/2007 03:22:04 11/27/2007 19:52:20 7.87 128.0 / 59.6<-me ![]() |
||
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
If it can help the techs who work on this problem I have asked for a new HCC WU for confirming my memories of my initial test.
----------------------------------------The page faults problem is not (or at least not only) caused by some kind of weird competition between several tasks running in multicore machines. Making sure my HCC WU is alone doing something meaningful in my P4 HT machine I am still seeing an average 40,000 page faults per second. And this number does not change if I start another CPU intensive process. Since I have enough RAM (1.5 GB) there is no corresponding swapping activity on my disk, but this is certainly not free at the CPU level. Good luck to the techs. Jean. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Here is an interesting result for this conversation: Workunit Status Project Name: Help Conquer Cancer Created: 11/24/2007 14:18:27 Name: X0000052451057200507211308 Minimum Quorum: 2 Initial Replication: 2 Result Name Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit X0000052451057200507211308_ 1-- Valid 11/24/2007 20:43:34 11/25/2007 05:25:34 5.39 8.3 / 8.3 X0000052451057200507211308_ 0-- Valid 11/24/2007 20:42:28 11/28/2007 14:34:36 7.49 85.4 / 8.3 I normally get just over 10 per hour of work and this one was just over 1 point per hour of work. Just thought it was strange enough to point out. This result is way out of line - 8.3 points for 7.5 hrs work is just not acceptable - it would seem that the other machine is claiming abnormally low yet is being taken as the closer consistant machine when allocating the credit ![]() |
||
|
|
![]() |