Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 21
|
![]() |
Author |
|
Mamajuanauk
Master Cruncher United Kingdom Joined: Dec 15, 2012 Post Count: 1900 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Here is a list f the wu's that errored. Running on a Ubuntu server 12.04
----------------------------------------E224185_ 235_ I.64.C50H26N6O8.00395927.2.set1d06_ 3-- MegaCruncher Error 31/07/14 20:14:45 01/08/14 15:40:12 7.29 / 7.73 147.9 / 0.0
Mamajuanauk is the Name! Crunching is the Game!
----------------------------------------![]() ![]() [Edit 1 times, last edit by Mamajuanauk at Aug 1, 2014 4:10:28 PM] |
||
|
captainjack
Advanced Cruncher Joined: Apr 14, 2008 Post Count: 144 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Welcome to the club. The Signal 11 error has been around for a while on Linux systems. It happens when BOINC gets lost for a bit then thinks something is wrong with the system and starts cancelling jobs. Sometimes it happens when the CPU gets really busy with something else and BOINC doesn't get any attention. The recommended solution for that problem is to go into Device Settings and set the option for "Suspend work if CPU usage is above" to 35%.
I have also seen it happen when BOINC gets lost on the Internet trying to talk to the host service (WCG in this case). I used to see these when there were DHCP problems on my ISP. Not much we could do about that one. The other thing you might try is to upgrade to the latest version of Ubuntu. I'm running 14.04 now and haven't seen a Signal 11 error in quite a while. Hope that helps. |
||
|
Mamajuanauk
Master Cruncher United Kingdom Joined: Dec 15, 2012 Post Count: 1900 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Welcome to the club. The Signal 11 error has been around for a while on Linux systems. It happens when BOINC gets lost for a bit then thinks something is wrong with the system and starts cancelling jobs. Sometimes it happens when the CPU gets really busy with something else and BOINC doesn't get any attention. The recommended solution for that problem is to go into Device Settings and set the option for "Suspend work if CPU usage is above" to 35%. Many thanks Jack. tha system only runs WCG nothing else, I will try upgrading though...I have also seen it happen when BOINC gets lost on the Internet trying to talk to the host service (WCG in this case). I used to see these when there were DHCP problems on my ISP. Not much we could do about that one. The other thing you might try is to upgrade to the latest version of Ubuntu. I'm running 14.04 now and haven't seen a Signal 11 error in quite a while. Hope that helps.
Mamajuanauk is the Name! Crunching is the Game!
![]() ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Were you not there before, too many cep2 running, trying to start concurrently? The storage access is the bottleneck, much less pronounced on windows. Once again this highlights the need for staggered starting and resuming, a trac development ticket raised bij keithing reed over a year ago.
|
||
|
Mamajuanauk
Master Cruncher United Kingdom Joined: Dec 15, 2012 Post Count: 1900 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Were you not there before, too many cep2 running, trying to start concurrently? The storage access is the bottleneck, much less pronounced on windows. Once again this highlights the need for staggered starting and resuming, a trac development ticket raised bij keithing reed over a year ago. Yep, I was there sometime ago, however, things have been running smoothly for ages. Strange this problem repeats just when some new libraries have hit the grid...
Mamajuanauk is the Name! Crunching is the Game!
![]() ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
To add, the tasks at this time are rather short, so the other weakness, nic/wifi/zipping load comes into play as well. Measuring 2 hours average now when last week we were doing close to 6 hours per task. There's that critical job #2 and the very taxing setups just before that to top it up. The bell tolls, and no one at wcg is hearing it, it's that second monkey of 4 from the far east, this one
![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
With regards to the length of time that the jobs run, the jobs basically scale with the square of the number of electrons (i.e. doubling the size will result in a job that takes 4 times as long). Yes, the initial jobs in this library are short, but they will get longer very quickly, and we are looking to place the majority of jobs we send out in a range which is 'grid friendly'. The reason for the fewer number of jobs per work unit is to allow us to crunch larger, more exciting molecules - which, after all, is the name of the game!
Your Harvard CEP Team |
||
|
armstrdj
Former World Community Grid Tech Joined: Oct 21, 2004 Post Count: 695 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
We are aware of the issues and working on a solution. Please monitor this thread for updates
https://secure.worldcommunitygrid.org/forums/wcg/viewthread_thread,37056 Thanks, armstrdj |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I am working on setting all results that came back with this error for recheck. Please be patient as this could take some time to clean up. I will post when we believe all work units have been rechecked with the fixed up validator.
Thanks, -Uplinger |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
We would like to give huge props to the great guys at IBM for sorting this out so quickly - you guys are amazing!
Your Harvard CEP Team |
||
|
|
![]() |