Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: The Clean Energy Project - Phase 2 Forum Thread: Errors - Low efficiency |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 15
|
Author |
|
Jim1348
Veteran Cruncher USA Joined: Jul 13, 2009 Post Count: 1066 Status: Offline Project Badges: |
The Errors are all similar and related to the famous "No hartbeat for 30 seconds" which is probably the sign of an overloaded cruncher. I used to see that when running all 4 cores of a quad-core (2.5 GHz), but haven't seen it ever since I started using a Ramdisk. A good SSD will do also, but with all the cores on CEP2, you could shorten the lifetime. I think I was getting about 70 GB/day of writes per core, sometimes more. I went to the Ramdisk mainly to save the SSD. [Edit 1 times, last edit by Jim1348 at Feb 2, 2012 9:20:06 PM] |
||
|
sk..
Master Cruncher http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif Joined: Mar 22, 2007 Post Count: 2324 Status: Offline Project Badges: |
Increasing the checkpoint time should reduce overall I/O.
----------------------------------------Assuming CEP2 is FP intensive, it might help to run some integer intensive tasks (HCC). A good (PCI/PCIE-x1) network card would reduce the CPU's involvement when it comes to data transfer. If you have 21 rigs all hooked up to a router, say by 3 switches, there could be a network bandwidth bottleneck; upload >rates< tends to be much less than download. [Edit 1 times, last edit by skgiven at Feb 2, 2012 11:47:49 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
? There are only 16 checkpoints in a job and some are hours apart [and cant be controlled by BOINC through the WtD setting]. The download is a few 100 Kb, the uploads are now exceeding 33Mb a pop for CEP2... it must be late... time for
--//-- |
||
|
sk..
Master Cruncher http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif Joined: Mar 22, 2007 Post Count: 2324 Status: Offline Project Badges: |
I/O reduction would not increase CEP2 performance directly, but since now running other tasks then reducing I/O should improve 'overall' performance (via other tasks) and might help stability (prevent hearbeat 'flutters'). Also saves the drives, and reduces noise.
Unfortunately this is not configurable online, so lot's of chair hopping required. I meant upload bandwidth/transfer rates tends to be much less than download (>rates< inserted for clarity). If 21systems average 10threads each and each thread could turnover 3tasks per day then upload requirement would be 20GB per day! Might be a problem in itself. That would be ~66% of my maximum theoretical upload per day, and I think someone (Bolt) would throttle me by the time I reached 2GB. To Boston I'm getting ~2.3Mbps upload so that would be more like 80.5% of my theoretical maximum (which would require continuous uploads). So what's the bandwidth form the Alps to Cambridge (Boston) like? |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Given that the issue is with 4 specific of 21 systems, there is more than likely still something different as previous suggested. For Hypernova to consider to do long running tasks with small file on the side such as FAAH [don't think fpops/iops makes one iota of difference, not seen any when running 4 CEP2 and 4 GFAM/DSFL on the side for my octo threaded] and pref a low cache, then set a profile for the 4 problem devices and choose a number of CEP2's that he finds to be stable for these.
--//-- |
||
|
|