Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 10
|
![]() |
Author |
|
litimetal
Cruncher Joined: Jan 9, 2011 Post Count: 5 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'm using Fedora 16(32bit), with AMD A6 3670. BOINC 6.10.58
----------------------------------------![]() Please check this problem, thanks <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process got signal 11 </message> <stderr_txt> Commandline = ../../projects/www.worldcommunitygrid.org/wcg_c4cw_lmps_6.40_i686-pc-linux-gnu -screen none -in in.wcg.acc -var wcgsteps1 10000 -var wcgsteps2 10000 -var loop 0 -var restart 0 -var rinterval 100 -var ifile in.wcg.acc -var wcgseed 3919893 [18:14:08] Percent complete = 0.499975 [18:15:06] Percent complete = 0.999950 [18:16:05] Percent complete = 1.499925 [18:17:02] Percent complete = 1.999900 [18:18:01] Percent complete = 2.499875 [18:18:58] Percent complete = 2.999850 [18:19:56] Percent complete = 3.499825 [18:20:57] Percent complete = 3.999800 [18:21:55] Percent complete = 4.499775 [18:22:53] Percent complete = 4.999750 [18:23:51] Percent complete = 5.499725 [18:24:49] Percent complete = 5.999700 [18:25:47] Percent complete = 6.499675 [18:26:46] Percent complete = 6.999650 [18:27:44] Percent complete = 7.499625 [18:28:42] Percent complete = 7.999600 [18:29:40] Percent complete = 8.499575 [18:30:38] Percent complete = 8.999550 [18:31:35] Percent complete = 9.499525 [18:32:34] Percent complete = 9.999500 [18:33:31] Percent complete = 10.499475 [18:34:30] Percent complete = 10.999450 [18:35:29] Percent complete = 11.499425 </stderr_txt> ]]> <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process got signal 11 </message> <stderr_txt> Commandline = ../../projects/www.worldcommunitygrid.org/wcg_c4cw_lmps_6.40_i686-pc-linux-gnu -screen none -in in.wcg.acc -var wcgsteps1 10000 -var wcgsteps2 10000 -var loop 0 -var restart 0 -var rinterval 100 -var ifile in.wcg.acc -var wcgseed 3934175 [18:04:20] Percent complete = 0.499975 [18:05:18] Percent complete = 0.999950 [18:06:14] Percent complete = 1.499925 [18:07:12] Percent complete = 1.999900 [18:08:09] Percent complete = 2.499875 [18:09:07] Percent complete = 2.999850 [18:10:04] Percent complete = 3.499825 [18:11:01] Percent complete = 3.999800 [18:11:58] Percent complete = 4.499775 [18:12:55] Percent complete = 4.999750 [18:13:53] Percent complete = 5.499725 [18:14:51] Percent complete = 5.999700 [18:15:49] Percent complete = 6.499675 [18:16:47] Percent complete = 6.999650 [18:17:45] Percent complete = 7.499625 [18:18:42] Percent complete = 7.999600 [18:19:40] Percent complete = 8.499575 [18:20:38] Percent complete = 8.999550 [18:21:36] Percent complete = 9.499525 [18:22:34] Percent complete = 9.999500 [18:23:31] Percent complete = 10.499475 [18:24:28] Percent complete = 10.999450 [18:25:26] Percent complete = 11.499425 [18:26:24] Percent complete = 11.999400 [18:27:22] Percent complete = 12.499375 [18:28:19] Percent complete = 12.999350 [18:29:17] Percent complete = 13.499325 [18:30:14] Percent complete = 13.999300 [18:31:12] Percent complete = 14.499275 [18:32:11] Percent complete = 14.999250 [18:33:09] Percent complete = 15.499225 [18:34:06] Percent complete = 15.999200 [18:35:05] Percent complete = 16.499175 </stderr_txt> ]]> |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7655 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
process got signal 11 Your computer is too busy. What else do you have running? What OS are you using? Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi,
Signal 11 is generally a "system overload" indicator. Set "Leave application in memory when suspended" in the client prefs and also "While processor usage is less than" to e.g. 25% [default is 50% for WCG]. Then in the BOINC Manager Activity menu set client to run based on preferences. What happens is that any time your system is too busy, tasks start timing out because there's a communication interruption for longer than 30 seconds [science app fails to talk timely to BOINC core client]. Often this goes together with the message log at times also showing tasks being restarted at last checkpoint or "zero status" messages. Remedy: Make BOINC pause whenever you're doing heavy stuff such as sudo apt-get update / install. The <exclusive_app> parm can be set for these heavy duty apps, so BOINC will pause completely long as they're loaded (games too if you want to). |
||
|
KWSN - A Shrubbery
Master Cruncher Joined: Jan 8, 2006 Post Count: 1585 Status: Offline |
Worth mentioning that this is also a common failure when attempting to run wireless networking with Xnix. Sekerob had a detailed analysis of potential fixes, but I just gave up on trying. Wired only for all my Ubuntu machines.
----------------------------------------![]() Distributed computing volunteer since September 27, 2000 |
||
|
litimetal
Cruncher Joined: Jan 9, 2011 Post Count: 5 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Worth mentioning that this is also a common failure when attempting to run wireless networking with Xnix. Sekerob had a detailed analysis of potential fixes, but I just gave up on trying. Wired only for all my Ubuntu machines. Well, I think this is the real reason.I'm using wireless and I have to admit that wicd(similar to NetworkManager) is not so stable... But it's quite funny that I'm fetching Help Conquer Cancer and ALLthe jobs finished successfully. |
||
|
litimetal
Cruncher Joined: Jan 9, 2011 Post Count: 5 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks, I will take your suggestions.
----------------------------------------But I have a 4-core CPU and I set that only 3 cores could be used by BOINC. Isn't it enough to save enough computing resource for other tasks(No games, no compiler) |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7655 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Worth mentioning that this is also a common failure when attempting to run wireless networking with Xnix. Sekerob had a detailed analysis of potential fixes, but I just gave up on trying. Wired only for all my Ubuntu machines. Well, I think this is the real reason.I'm using wireless and I have to admit that wicd(similar to NetworkManager) is not so stable... But it's quite funny that I'm fetching Help Conquer Cancer and ALLthe jobs finished successfully. I run several machines using Linux which I need to connect wirelessly. Whenever any system is unable to establish and keep a good connection while it is trying to upload or download results or tasks I also get these "signal 11" computation errors. I have been able to minimize, but not eliminate these errors by setting the preferences on all the machines so that only one machine at a time is trying to use the wireless. I thought maybe I had a congestion problem with too many machines battling for a piece of the action at once, but even with only one machine at a time using the wireless connection I still get this problem occasionally. Curiously it only occurs with Linux Mint 11 and 12, never has happened with one system running Linux Mint 7( which I never upgraded because it was so stable.) I don't know why the HCC would be unaffected except to say it uses integer processing and not any floating point processing. They are also quite small uploads and downloads so they are not communicating very long. Quick hitters so to speak. Good luck Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
One [partial] fix to the WIFI issue [almost forgot about this, since Ubu 12.10 does not exhibit the problem for me and my D-link 802.11N dongle] is, to execute the following command in terminal:
----------------------------------------sudo iwconfig wlan0 power off This prevents WIFI to go to sleep [WIFI running at max signal]. wlan0 could also be wlan1 or wlan2, but iwconfig will throw a warning if wlan0 cant be found. For a long time I ran with the client off-line and a 1.2 day cache, then allowing a 30 minute scheduled connect in BOINC [you can configure that in the local prefs for each day of the week]. At least that meant 23.5 hours of assurance that no task would fail because of this [which was really BOINC going bonkers over the localhost IP... a load issue too.] Nowadays just using whatever the Ubu installer throws on the toolbar as far as indicator. Not installing anything extra such as WICD. [Edit 1 times, last edit by Former Member at Jan 9, 2013 8:46:43 AM] |
||
|
litimetal
Cruncher Joined: Jan 9, 2011 Post Count: 5 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks.
----------------------------------------I still wonder why the unstable network could result in computing error.The Boinc got the wrong data about the task? Anyway, I've devided to focus on HCC until I have a chance to buy a new wireless card&router. Best Wishes. P.S. I'm sorry that my words may be confusing or else....I have to admit I'm not quite good at English. If you can't understand my words, please tell me, and I'll try another way to express my ideas. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
No worries... speaking/writing 5 tongues, I've long learned to interpret most that might get "Lost in translation" (My favorite, non groundhog, Bill Murray movie).
Anyway, the proposed iwconfig command I gave costs nothing or the short daily automated or scheduled connection, then run a mix of HCC/C4CW [this latter science is bound to finish this target in about 3 months on known work]. You can decide to spend money later if it does not resolve the problem for the majority of C4CW jobs [or all]. It's been a problem on Linux... used to be on Windows too... whenever the connection to the router failed, even the internet, when it is really too many interrupts going to the localhost IP, the traffic over the Network interface card [NIC]. BOINC and the science app insist on talking to each other at least once every 30 seconds, but if too much is going in, these handshakes might get lost, then the job restarts or fails completely. It's been on the BOINC dev [how to replace this fail prone system] for years... ticket been open since about middle of last decade. So we sigh, give the work arounds and move one once more. zhu ni you ge mei hao de yi tian (but that came from google ;o) |
||
|
|
![]() |