Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 15
Posts: 15   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1100 times and has 14 replies Next Thread
Jim1348
Veteran Cruncher
USA
Joined: Jul 13, 2009
Post Count: 1066
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Errors - Low efficiency

The Errors are all similar and related to the famous "No hartbeat for 30 seconds" which is probably the sign of an overloaded cruncher.

I used to see that when running all 4 cores of a quad-core (2.5 GHz), but haven't seen it ever since I started using a Ramdisk.

A good SSD will do also, but with all the cores on CEP2, you could shorten the lifetime. I think I was getting about 70 GB/day of writes per core, sometimes more. I went to the Ramdisk mainly to save the SSD.
----------------------------------------
[Edit 1 times, last edit by Jim1348 at Feb 2, 2012 9:20:06 PM]
[Feb 2, 2012 9:16:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
sk..
Master Cruncher
http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif
Joined: Mar 22, 2007
Post Count: 2324
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Errors - Low efficiency

Increasing the checkpoint time should reduce overall I/O.
Assuming CEP2 is FP intensive, it might help to run some integer intensive tasks (HCC).

A good (PCI/PCIE-x1) network card would reduce the CPU's involvement when it comes to data transfer.

If you have 21 rigs all hooked up to a router, say by 3 switches, there could be a network bandwidth bottleneck; upload >rates< tends to be much less than download.
----------------------------------------
[Edit 1 times, last edit by skgiven at Feb 2, 2012 11:47:49 PM]
[Feb 2, 2012 11:32:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Errors - Low efficiency

? There are only 16 checkpoints in a job and some are hours apart [and cant be controlled by BOINC through the WtD setting]. The download is a few 100 Kb, the uploads are now exceeding 33Mb a pop for CEP2... it must be late... time for sleep

--//--
[Feb 2, 2012 11:41:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
sk..
Master Cruncher
http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif
Joined: Mar 22, 2007
Post Count: 2324
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Errors - Low efficiency

I/O reduction would not increase CEP2 performance directly, but since now running other tasks then reducing I/O should improve 'overall' performance (via other tasks) and might help stability (prevent hearbeat 'flutters'). Also saves the drives, and reduces noise.
Unfortunately this is not configurable online, so lot's of chair hopping required.

I meant upload bandwidth/transfer rates tends to be much less than download (>rates< inserted for clarity).

If 21systems average 10threads each and each thread could turnover 3tasks per day then upload requirement would be 20GB per day! Might be a problem in itself. That would be ~66% of my maximum theoretical upload per day, and I think someone (Bolt) would throttle me by the time I reached 2GB. To Boston I'm getting ~2.3Mbps upload so that would be more like 80.5% of my theoretical maximum (which would require continuous uploads).
So what's the bandwidth form the Alps to Cambridge (Boston) like?
[Feb 3, 2012 12:40:48 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Errors - Low efficiency

Given that the issue is with 4 specific of 21 systems, there is more than likely still something different as previous suggested. For Hypernova to consider to do long running tasks with small file on the side such as FAAH [don't think fpops/iops makes one iota of difference, not seen any when running 4 CEP2 and 4 GFAM/DSFL on the side for my octo threaded] and pref a low cache, then set a profile for the 4 problem devices and choose a number of CEP2's that he finds to be stable for these.

--//--
[Feb 3, 2012 10:32:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 15   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread