Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 114
Posts: 114   Pages: 12   [ Previous Page | 3 4 5 6 7 8 9 10 11 12 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 13545 times and has 113 replies Next Thread
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

You're writing 500GB, so maybe rub the eyes ;>)
[Mar 1, 2016 12:00:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1322
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

As discussed before the "Maximum disk usage exceeded" is not meaning you have no disk space enough, but the task has used more than the 2GB in BOINC's slotdirectory.
You should find an error message in BOINC Manager's event log.
[Mar 1, 2016 12:01:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1322
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

Several CEP2 Beta's are treated as an error with 18 hours run time -> "Killing job because cpu time limit has been exceeded"

Other error-results seem to have a normal outcome, what let me think "something wrong with validator?"

Result Name: BETA_ E236295_ 371_ S.320.C40H28N2O3Si1.YGRYAHPWFARKLU-UHFFFAOYSA-N.9_ s1_ 14_ 0--

<core_client_version>7.6.22</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[12:02:46] Number of jobs = 5
[12:02:46] Starting job 0,CPU time has been restored to 0.000000.
[19:14:46] Finished Job #0
[19:14:46] Starting job 1,CPU time has been restored to 24379.687500.
[20:25:18] Finished Job #1
[20:25:18] Starting job 2,CPU time has been restored to 28329.093750.
[20:40:18] Finished Job #2
[20:40:18] Starting job 3,CPU time has been restored to 29182.156250.
Application exited with RC = 0x1
[00:21:52] Finished Job #3
[00:21:52] Starting job 4,CPU time has been restored to 41280.109375.
[00:21:52] Skipping Job #4
00:21:54 (97636): called boinc_finish
[Mar 1, 2016 5:06:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

2 of 8 completed were rated error. On review see they never made it to the first checkpoint, several restarts [due heartbeat lost], then killed at 18:00:01, no credit. This is the same in production.
[Mar 1, 2016 5:19:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

We are going to be running the remaining 2000 work units with an increased disk allowance per result of 2.5GB. This will match what is used on production cep2 results. Previously this was set to 2GB and we are going to see if increasing it to 2.5 will resolve the disk usage errors.

As for the 195 errors, we are doing some stand alone testing on these results to formulate a plan on eliminating them going forward.

Thanks,
-Uplinger
[Mar 1, 2016 6:13:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

We are going to be running the remaining 2000 work units with an increased disk allowance per result of 2.5GB. This will match what is used on production cep2 results. Previously this was set to 2GB and we are going to see if increasing it to 2.5 will resolve the disk usage errors.

As for the 195 errors, we are doing some stand alone testing on these results to formulate a plan on eliminating them going forward.

Thanks,
-Uplinger


Per https://www.worldcommunitygrid.org/help/viewTopic.do?shortName=minimumreq it's 2GB or 2,048MB, but think a good while back there was post made [moi? **], which highlighted that the disk_bound setting said 2.5GB for CEP2, pretty please a correction. It's never too late as they say tongue

edit: ** No it was Crystal Pellet who highlighted this just only 1.5 years ago: https://secure.worldcommunitygrid.org/forums/wcg/viewpostinthread?post=465887
----------------------------------------
[Edit 2 times, last edit by SekeRob* at Mar 1, 2016 6:43:31 PM]
[Mar 1, 2016 6:35:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1322
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

2 of 8 completed were rated error. On review see they never made it to the first checkpoint, several restarts [due heartbeat lost], then killed at 18:00:01, no credit. This is the same in production.

Not these 2:

[03:52:46] Starting job 3,CPU time has been restored to 64614.943396.
Killing job because cpu time has been exceeded. Subjob start time = 810298837, Subjob current time = 1089440990
[03:55:53] Finished Job #3
03:56:11 (3076): called boinc_finish


[10:46:11] Finished Job #4
[10:46:11] Starting job 5,CPU time has been restored to 63973.234375.
Killing job because cpu time has been exceeded. Subjob start time = -2147483648, Subjob current time = 1089420455
[11:10:11] Finished Job #5
11:10:26 (98004): called boinc_finish
[Mar 1, 2016 6:45:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

Why has this task gone from PV to error?

Result Log

Result Name: BETA_ E236293_ 252_ S.320.C41H29N3O1S1.DDFHVWITZPPDEG-UHFFFAOYSA-N.3_ s1_ 14_ 1--
<core_client_version>7.6.9</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[13:30:43] Number of jobs = 5
[13:30:43] Starting job 0,CPU time has been restored to 0.000000.
[19:14:22] Finished Job #0
[19:14:22] Starting job 1,CPU time has been restored to 19139.247887.
[19:33:18] Finished Job #1
[19:33:18] Starting job 2,CPU time has been restored to 20266.199111.
[19:52:36] Finished Job #2
[19:52:36] Starting job 3,CPU time has been restored to 21408.438433.
Application exited with RC = 0x1
[23:32:19] Finished Job #3
[23:32:19] Starting job 4,CPU time has been restored to 34476.096199.
[23:32:19] Skipping Job #4
23:32:27 (4492): called boinc_finish

</stderr_txt>
]]>

When this task went from PV to valid?

Result Log

Result Name: BETA_ E236293_ 923_ S.314.C35F1H23N8S1.FDSYCGGWQAJNNT-UHFFFAOYSA-N.11_ s1_ 14_ 2--
<core_client_version>7.4.36</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[16:19:45] Number of jobs = 5
[16:19:45] Starting job 0,CPU time has been restored to 0.000000.
[02:34:28] Finished Job #0
[02:34:28] Starting job 1,CPU time has been restored to 36757.236422.
[03:57:29] Finished Job #1
[03:57:29] Starting job 2,CPU time has been restored to 41731.047505.
[04:23:54] Finished Job #2
[04:23:54] Starting job 3,CPU time has been restored to 43308.794819.
Application exited with RC = 0x1
[09:27:30] Finished Job #3
[09:27:30] Starting job 4,CPU time has been restored to 61507.200674.
[09:27:30] Skipping Job #4
09:27:33 (5156): called boinc_finish

</stderr_txt>
]]>

Both result logs look exactly the same to me except for the time stamps. Did I miss something?
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


[Mar 1, 2016 6:59:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

Nano,

The work unit you have in question going from PV to error is because the Nuclear energy returned is outside of the accepted threshold given to us by the researchers.

The other question is, should this be considered invalid instead. Probably, I can take a look into that.

Thanks,
-Uplinger
[Mar 1, 2016 8:29:14 PM]   Link   Report threatening or abusive post: please login first  Go to top 
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

Nano,

The work unit you have in question going from PV to error is because the Nuclear energy returned is outside of the accepted threshold given to us by the researchers.

The other question is, should this be considered invalid instead. Probably, I can take a look into that.

Thanks,
-Uplinger

Thanks for the explanation Keith. I'm assuming that there is no way to discern that issue from the results log.

I'll have another question about points awarded later. Too busy with honey-dos ATM. biggrin
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


[Mar 1, 2016 8:48:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 114   Pages: 12   [ Previous Page | 3 4 5 6 7 8 9 10 11 12 | Next Page ]
[ Jump to Last Post ]
Post new Thread