Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 11
|
![]() |
Author |
|
Composer
Cruncher Joined: May 28, 2014 Post Count: 29 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Im seeing a lot of WU's that are throwing computation errors, on my computer it is within seconds of the task starting, and judging by the stdrr logs the same is going for others as well.
Result Log Result Name: E236522_ 982_ S.212.C20H10N6O3S1.HODSJZLVGMNSMW-UHFFFAOYSA-N.13_ s1_ 14_ 2-- <core_client_version>7.2.47</core_client_version> <![CDATA[ <message> (unknown error) - exit code 195 (0xc3) </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [11:27:09] Number of jobs = 8 [11:27:09] Starting job 0,CPU time has been restored to 0.000000. Application exited with RC = 0x1 [11:27:10] Finished Job #0 11:27:10 (16708): called boinc_finish </stderr_txt> ]]> anybody know what exit code 195 indicates, or what it means when the application exits with RC = 0x1? It is not doing this on all of them, I currently have 8 that are crunching happily with no signs of any problems. Is this at all related to the recent issue with people not getting work units? |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I had the same problem a few years ago (and also saw my share of them recently -- just before the workunit well dried up). I got reassurance from a fellow cruncher here:
https://secure.worldcommunitygrid.org/forums/...ead,33490_offset,0#385626 and here: https://secure.worldcommunitygrid.org/forums/...ead,34196_offset,0#399663 |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7668 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Still happening:
----------------------------------------Result Log Result Name: E236523_ 381_ S.216.C18F2H8N6O1S2.NXTRQRZZGYDYOX-UHFFFAOYSA-N.21_ s1_ 14_ 1-- <core_client_version>7.2.7</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61) </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [17:20:30] Number of jobs = 8 [17:20:30] Starting job 0,CPU time has been restored to 0.000000. [17:20:33] Starting new Job [17:20:33] Qink name = fldman [17:20:34] Qink name = gesman Error reading in TMP file 53/0 (1479200): No such file or directory Application exited with RC = 0x100 [17:20:34] Finished Job #0 17:20:35 (4159): called boinc_finish Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
About Error reading in TMP file 53/0 (1479200): No such file or directory message, as I recently said in another thread, I have seen it appearing while crunching too much WUs simultaneously (or too much large WUs simultaneously) on a RAM disk thus the RAM size became insufficient to satisfy the request. Maybe this is your case too.
|
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7668 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
About Error reading in TMP file 53/0 (1479200): No such file or directory message, as I recently said in another thread, I have seen it appearing while crunching too much WUs simultaneously (or too much large WUs simultaneously) on a RAM disk thus the RAM size became insufficient to satisfy the request. Maybe this is your case too. I don't think so. This is an dual xeon e5410 with 8gb of ram. It used to crunch up to three CEP2 units simultaneaously while also doing 5 units for other projects, mainly MCM1 without a problem. This is the first and only CEP2 unit I have had for about 3 to 4 weeks. I am thinking defective workunit as two other machines also choked on this unit with errors in the same fashin in about the same time. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Think, defective units is the only thing left in circulation for CEP2.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
About Error reading in TMP file 53/0 (1479200): No such file or directory message, as I recently said in another thread, I have seen it appearing while crunching too much WUs simultaneously (or too much large WUs simultaneously) on a RAM disk thus the RAM size became insufficient to satisfy the request. Maybe this is your case too. I don't think so. This is an dual xeon e5410 with 8gb of ram. It used to crunch up to three CEP2 units simultaneaously while also doing 5 units for other projects, mainly MCM1 without a problem. This is the first and only CEP2 unit I have had for about 3 to 4 weeks. I am thinking defective workunit as two other machines also choked on this unit with errors in the same fashin in about the same time. Cheers It sounds convincing (especially if that kind of error didn't show up in the past under the same conditions), however I still see some similarities between your case and mine: when I experienced the above error I was crunching 7 WUs at a time on an 8 GB RAM disk so the load on my CPU was similar to yours. If none of the other crunchers got the Error reading in TMP file message but all of them got the process exited with code 195 (0xc3, -61) message then it may be possible that your WU failed (error code 195) because it was defective but only you received the other message (error readinf in TMP file) because of scarce memory. |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7668 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
About Error reading in TMP file 53/0 (1479200): No such file or directory message, as I recently said in another thread, I have seen it appearing while crunching too much WUs simultaneously (or too much large WUs simultaneously) on a RAM disk thus the RAM size became insufficient to satisfy the request. Maybe this is your case too. I don't think so. This is an dual xeon e5410 with 8gb of ram. It used to crunch up to three CEP2 units simultaneaously while also doing 5 units for other projects, mainly MCM1 without a problem. This is the first and only CEP2 unit I have had for about 3 to 4 weeks. I am thinking defective workunit as two other machines also choked on this unit with errors in the same fashin in about the same time. Cheers It sounds convincing (especially if that kind of error didn't show up in the past under the same conditions), however I still see some similarities between your case and mine: when I experienced the above error I was crunching 7 WUs at a time on an 8 GB RAM disk so the load on my CPU was similar to yours. If none of the other crunchers got the Error reading in TMP file message but all of them got the process exited with code 195 (0xc3, -61) message then it may be possible that your WU failed (error code 195) because it was defective but only you received the other message (error readinf in TMP file) because of scarce memory. Three out of the four errors got the "Error reading in TMP file 53/0 (1479200): No such file or directory" entry. All got the "process exited with code 195" entry. The fifth unit was "server aborted." The one that did not get the TMP error was Windows. Mine was Linux and the other two errors were Darwin, all with the TMP error. I am sticking with a problem with the work unit. Thanks for the input, your thoughts were appreciated. Cheers Edit: spelling
Sgt. Joe
----------------------------------------*Minnesota Crunchers* [Edit 2 times, last edit by Sgt.Joe at Apr 30, 2016 6:35:32 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I was on Linux when I encountered the error too. Hence I guess it's a Linux related problem which occurs with faulty WUs, contrary to what I thounght until now. Thanks for taking me to the (likely) right answer.
|
||
|
RTorpey
Advanced Cruncher Joined: Aug 24, 2005 Post Count: 67 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Just had another Windows job exit with:
Result Log Result Name: E236523_ 493_ S.222.C21H13N5S3.MGWOEAQCIPWOBG-UHFFFAOYSA-N.22_ s1_ 14_ 0-- <core_client_version>7.6.22</core_client_version> <![CDATA[ <message> (unknown error) - exit code 195 (0xc3) </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [07:51:57] Number of jobs = 8 [07:51:57] Starting job 0,CPU time has been restored to 0.000000. Application exited with RC = 0x1 [07:52:43] Finished Job #0 07:52:43 (10572): called boinc_finish </stderr_txt> ]]> |
||
|
|
![]() |