Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 29
Posts: 29   Pages: 3   [ Previous Page | 1 2 3 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1467 times and has 28 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: "Computation error" with linux kernel > 3.6 ?

Someone posted a fix on the WCG forums [I could not get to work for myself on Ubuntu], which entailed forcing 127.0.0.1 traffic the roundabout way to stop those crashes [the hosts file was involved in that]. It's inherently a BOINC core client issue where the network part of BOINC goes berzerk when it can't find the router, and mostly a WIFI related issue. No issue with Ethernet.
[Nov 28, 2012 6:18:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7219
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: "Computation error" with linux kernel > 3.6 ?

Someone posted a fix on the WCG forums [I could not get to work for myself on Ubuntu], which entailed forcing 127.0.0.1 traffic the roundabout way to stop those crashes [the hosts file was involved in that]. It's inherently a BOINC core client issue where the network part of BOINC goes berzerk when it can't find the router, and mostly a WIFI related issue. No issue with Ethernet.

From my experience it is definitely a WIFI issue. I have never had it on ethernet connected devices.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Nov 29, 2012 1:03:43 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7219
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: "Computation error" with linux kernel > 3.6 ?

Another thing that can cause the signal 11 error is if BOINC attempts to access the Internet and the connection is down. Any running WUs of certain kinds will fail when that happens.


I may have found the problem, but will need to test. I have 7 machines with 26 cores all going through a range extender to my wireless router (g). I did not have any trouble with this until I put the last machine on. I think I may just have overloaded the connection from time to time. I may just be trying to cram too much information down too small a pipe. I could upgrade to N - wireless or I could figure out a way for the machines to all take turns so to speak. For the time being I am just manually downloading one machine - an 8 core - manually once or twice a day. So far so good. I will be in search of a more elegant solution.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Dec 3, 2012 1:52:03 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: "Computation error" with linux kernel > 3.6 ?

We're talking Linux/Ubuntu off course. Never have I seen this on Windows.

The workaround, if you think it's congestive load is, to set an offset connect interval. The Ubuntu instance gets 30 minutes 1 hour before EOD [On v7 this automatically forces reporting clearing Ready to Report too (RtR) **]. Set other machines to scheduled networking for different time segments.

** As of a next version, above 7.0.39, the maximum RtR that can build before clearing will be set to 64. This is with e.g. the short task GPU crunchers in mind. Those that manage a machine running 12 concurrent on a 7950 could have that within the hour. Reduces size of scheduler hit as server side benefit.
[Dec 3, 2012 9:36:15 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7219
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: "Computation error" with linux kernel > 3.6 ?

We're talking Linux/Ubuntu off course. Never have I seen this on Windows.

Yes, Linux Mint 12. Thanks
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Dec 3, 2012 11:34:23 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7219
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: "Computation error" with linux kernel > 3.6 ?

We're talking Linux/Ubuntu off course. Never have I seen this on Windows.

The workaround, if you think it's congestive load is, to set an offset connect interval. The Ubuntu instance gets 30 minutes 1 hour before EOD [On v7 this automatically forces reporting clearing Ready to Report too (RtR) **]. Set other machines to scheduled networking for different time segments.

** As of a next version, above 7.0.39, the maximum RtR that can build before clearing will be set to 64. This is with e.g. the short task GPU crunchers in mind. Those that manage a machine running 12 concurrent on a 7950 could have that within the hour. Reduces size of scheduler hit as server side benefit.

I set the connect times for the two 8 core machines to separate connection times, one from 00:01 to 11:00 and the other from 13:00 to 23:00. I gave each one a 1 day queue. So far this seems to have solved the problem so I am guessing is was congestion related. It did not affect any of the other linux systems, but they are all quite bit slower. It would be a nice feature if it was possible to set more than one window per machine, but this works so I should probably be satisfied.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
----------------------------------------
[Edit 1 times, last edit by Sgt.Joe at Dec 4, 2012 7:47:56 PM]
[Dec 4, 2012 7:45:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
B2I
Senior Cruncher
usa
Joined: Jan 23, 2011
Post Count: 232
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: "Computation error" with linux kernel > 3.6 ?

I had an issue like this on my ubuntu 12.04 machine. I know this is not a pleasing cure but when I mixed in a lot of different WUs so the computer was only running 2 or 3 CEP WUs at a time, all the "computational errors" stopped.
----------------------------------------

[Dec 5, 2012 2:59:19 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7219
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: "Computation error" with linux kernel > 3.6 ?

I had an issue like this on my ubuntu 12.04 machine. I know this is not a pleasing cure but when I mixed in a lot of different WUs so the computer was only running 2 or 3 CEP WUs at a time, all the "computational errors" stopped.

I have one of the octo machines doing 2 CEP2 units and the rest HCC and the other one doing FAAH and HFCC. For me I do not believe the issue was CEP2, but the issue was overloading my wireless connection. The issue did not happen until I added the second octo machine to the mix. I am glad you were able to cure your problem. Evidently there is a number of items which may be the cause.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Dec 5, 2012 3:09:30 AM]   Link   Report threatening or abusive post: please login first  Go to top 
litimetal
Cruncher
Joined: Jan 9, 2011
Post Count: 5
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: "Computation error" with linux kernel > 3.6 ?

I got a similar problem on Clean Water Project(I'm using Fedora)
https://secure.worldcommunitygrid.org/forums/...ead_thread,34533_lastpage
----------------------------------------
[Jan 9, 2013 1:07:30 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 29   Pages: 3   [ Previous Page | 1 2 3 ]
[ Jump to Last Post ]
Post new Thread