Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 114
|
![]() |
Author |
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
You're writing 500GB, so maybe rub the eyes ;>)
|
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1322 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
As discussed before the "Maximum disk usage exceeded" is not meaning you have no disk space enough, but the task has used more than the 2GB in BOINC's slotdirectory.
You should find an error message in BOINC Manager's event log. |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1322 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Several CEP2 Beta's are treated as an error with 18 hours run time -> "Killing job because cpu time limit has been exceeded"
Other error-results seem to have a normal outcome, what let me think "something wrong with validator?" Result Name: BETA_ E236295_ 371_ S.320.C40H28N2O3Si1.YGRYAHPWFARKLU-UHFFFAOYSA-N.9_ s1_ 14_ 0-- <core_client_version>7.6.22</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [12:02:46] Number of jobs = 5 [12:02:46] Starting job 0,CPU time has been restored to 0.000000. [19:14:46] Finished Job #0 [19:14:46] Starting job 1,CPU time has been restored to 24379.687500. [20:25:18] Finished Job #1 [20:25:18] Starting job 2,CPU time has been restored to 28329.093750. [20:40:18] Finished Job #2 [20:40:18] Starting job 3,CPU time has been restored to 29182.156250. Application exited with RC = 0x1 [00:21:52] Finished Job #3 [00:21:52] Starting job 4,CPU time has been restored to 41280.109375. [00:21:52] Skipping Job #4 00:21:54 (97636): called boinc_finish |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
2 of 8 completed were rated error. On review see they never made it to the first checkpoint, several restarts [due heartbeat lost], then killed at 18:00:01, no credit. This is the same in production.
|
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
We are going to be running the remaining 2000 work units with an increased disk allowance per result of 2.5GB. This will match what is used on production cep2 results. Previously this was set to 2GB and we are going to see if increasing it to 2.5 will resolve the disk usage errors.
As for the 195 errors, we are doing some stand alone testing on these results to formulate a plan on eliminating them going forward. Thanks, -Uplinger |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
We are going to be running the remaining 2000 work units with an increased disk allowance per result of 2.5GB. This will match what is used on production cep2 results. Previously this was set to 2GB and we are going to see if increasing it to 2.5 will resolve the disk usage errors. As for the 195 errors, we are doing some stand alone testing on these results to formulate a plan on eliminating them going forward. Thanks, -Uplinger Per https://www.worldcommunitygrid.org/help/viewTopic.do?shortName=minimumreq it's 2GB or 2,048MB, but think a good while back there was post made [moi? **], which highlighted that the disk_bound setting said 2.5GB for CEP2, pretty please a correction. It's never too late as they say ![]() edit: ** No it was Crystal Pellet who highlighted this just only 1.5 years ago: https://secure.worldcommunitygrid.org/forums/wcg/viewpostinthread?post=465887 [Edit 2 times, last edit by SekeRob* at Mar 1, 2016 6:43:31 PM] |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1322 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
2 of 8 completed were rated error. On review see they never made it to the first checkpoint, several restarts [due heartbeat lost], then killed at 18:00:01, no credit. This is the same in production. Not these 2: [03:52:46] Starting job 3,CPU time has been restored to 64614.943396. Killing job because cpu time has been exceeded. Subjob start time = 810298837, Subjob current time = 1089440990 [03:55:53] Finished Job #3 03:56:11 (3076): called boinc_finish [10:46:11] Finished Job #4 [10:46:11] Starting job 5,CPU time has been restored to 63973.234375. Killing job because cpu time has been exceeded. Subjob start time = -2147483648, Subjob current time = 1089420455 [11:10:11] Finished Job #5 11:10:26 (98004): called boinc_finish |
||
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Why has this task gone from PV to error?
----------------------------------------Result Log Result Name: BETA_ E236293_ 252_ S.320.C41H29N3O1S1.DDFHVWITZPPDEG-UHFFFAOYSA-N.3_ s1_ 14_ 1-- <core_client_version>7.6.9</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [13:30:43] Number of jobs = 5 [13:30:43] Starting job 0,CPU time has been restored to 0.000000. [19:14:22] Finished Job #0 [19:14:22] Starting job 1,CPU time has been restored to 19139.247887. [19:33:18] Finished Job #1 [19:33:18] Starting job 2,CPU time has been restored to 20266.199111. [19:52:36] Finished Job #2 [19:52:36] Starting job 3,CPU time has been restored to 21408.438433. Application exited with RC = 0x1 [23:32:19] Finished Job #3 [23:32:19] Starting job 4,CPU time has been restored to 34476.096199. [23:32:19] Skipping Job #4 23:32:27 (4492): called boinc_finish </stderr_txt> ]]> When this task went from PV to valid? Result Log Result Name: BETA_ E236293_ 923_ S.314.C35F1H23N8S1.FDSYCGGWQAJNNT-UHFFFAOYSA-N.11_ s1_ 14_ 2-- <core_client_version>7.4.36</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [16:19:45] Number of jobs = 5 [16:19:45] Starting job 0,CPU time has been restored to 0.000000. [02:34:28] Finished Job #0 [02:34:28] Starting job 1,CPU time has been restored to 36757.236422. [03:57:29] Finished Job #1 [03:57:29] Starting job 2,CPU time has been restored to 41731.047505. [04:23:54] Finished Job #2 [04:23:54] Starting job 3,CPU time has been restored to 43308.794819. Application exited with RC = 0x1 [09:27:30] Finished Job #3 [09:27:30] Starting job 4,CPU time has been restored to 61507.200674. [09:27:30] Skipping Job #4 09:27:33 (5156): called boinc_finish </stderr_txt> ]]> Both result logs look exactly the same to me except for the time stamps. Did I miss something?
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
![]() ![]() |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Nano,
The work unit you have in question going from PV to error is because the Nuclear energy returned is outside of the accepted threshold given to us by the researchers. The other question is, should this be considered invalid instead. Probably, I can take a look into that. Thanks, -Uplinger |
||
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Nano, The work unit you have in question going from PV to error is because the Nuclear energy returned is outside of the accepted threshold given to us by the researchers. The other question is, should this be considered invalid instead. Probably, I can take a look into that. Thanks, -Uplinger Thanks for the explanation Keith. I'm assuming that there is no way to discern that issue from the results log. I'll have another question about points awarded later. Too busy with honey-dos ATM. ![]()
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
![]() ![]() |
||
|
|
![]() |