Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 16
|
![]() |
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
My curiosity was piqued when I saw that a wingman had accomplished a job that took an I7 980X on my end 3.42 hours in only 0.21 hours (piqued, as in "Hey...I want that CPU!"). My log was...boringly normal (finished all jobs to 12, exit that job with RC = 0xc0000005, then skip out). Wingman's log, however, has a lot of "Quit requested" and then errors, etc:
----------------------------------------Result Name: E204170_ 960_ C.30.C25H14O2S2Si.00354979.0.set1d06_ 1-- <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [09:29:24] Number of jobs = 16 [09:29:24] Starting job 0,CPU time has been restored to 0.000000. [09:32:57] Finished Job #0 [09:32:57] Starting job 1,CPU time has been restored to 187.013999. [09:43:02] Finished Job #1 [09:43:02] Starting job 2,CPU time has been restored to 759.662470. [12:16:27] Number of jobs = 16 [12:16:27] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [12:36:30] Number of jobs = 16 [12:36:30] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [12:40:11] Number of jobs = 16 [12:40:11] Starting job 2,CPU time has been restored to 759.662470. [14:29:09] Number of jobs = 16 [14:29:09] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [15:04:42] Number of jobs = 16 [15:04:42] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [09:09:16] Number of jobs = 16 [09:09:16] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [09:43:09] Number of jobs = 16 [09:43:09] Starting job 2,CPU time has been restored to 759.662470. [09:50:40] Number of jobs = 16 [09:50:40] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [10:21:52] Number of jobs = 16 [10:21:52] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [10:22:12] Number of jobs = 16 [10:22:12] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [10:22:32] Number of jobs = 16 [10:22:32] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [10:23:13] Number of jobs = 16 [10:23:13] Starting job 2,CPU time has been restored to 759.662470. [10:57:02] Number of jobs = 16 [10:57:02] Starting job 2,CPU time has been restored to 759.662470. [14:04:50] Number of jobs = 16 [14:04:50] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [14:05:02] Number of jobs = 16 [14:05:02] Starting job 2,CPU time has been restored to 759.662470. [09:51:54] Number of jobs = 16 [09:51:54] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [10:19:15] Number of jobs = 16 [10:19:15] Starting job 2,CPU time has been restored to 759.662470. Application exited with RC = 0xc000013a [10:56:49] Number of jobs = 16 [10:56:49] Starting job 2,CPU time has been restored to 759.662470. [11:34:25] Number of jobs = 16 [11:34:25] Starting job 2,CPU time has been restored to 759.662470. Application exited with RC = 0xc000013a [13:48:24] Number of jobs = 16 [13:48:24] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [13:49:15] Number of jobs = 16 [13:49:15] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [13:59:16] Number of jobs = 16 [13:59:16] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [15:05:25] Number of jobs = 16 [15:05:25] Starting job 2,CPU time has been restored to 759.662470. [11:22:37] Number of jobs = 16 [11:22:37] Starting job 2,CPU time has been restored to 759.662470. [12:12:19] Number of jobs = 16 [12:12:19] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [12:13:40] Number of jobs = 16 [12:13:40] Starting job 2,CPU time has been restored to 759.662470. [13:45:32] Number of jobs = 16 [13:45:32] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [14:31:22] Number of jobs = 16 [14:31:22] Starting job 2,CPU time has been restored to 759.662470. [10:03:51] Number of jobs = 16 [10:03:51] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [10:04:41] Number of jobs = 16 [10:04:41] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [10:19:17] Number of jobs = 16 [10:19:17] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [10:23:07] Number of jobs = 16 [10:23:07] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [10:23:37] Number of jobs = 16 [10:23:37] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [10:24:37] Number of jobs = 16 [10:24:37] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [10:25:57] Number of jobs = 16 [10:25:57] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [10:28:58] Number of jobs = 16 [10:28:58] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [10:29:58] Number of jobs = 16 [10:29:58] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [10:32:38] Number of jobs = 16 [10:32:38] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [10:32:58] Number of jobs = 16 [10:32:58] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [10:34:59] Number of jobs = 16 [10:34:59] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [10:36:29] Number of jobs = 16 [10:36:29] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [10:37:19] Number of jobs = 16 [10:37:19] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [10:37:49] Number of jobs = 16 [10:37:49] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [10:38:50] Number of jobs = 16 [10:38:50] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [10:40:00] Number of jobs = 16 [10:40:00] Starting job 2,CPU time has been restored to 759.662470. [12:41:10] Number of jobs = 16 [12:41:10] Starting job 2,CPU time has been restored to 759.662470. Application exited with RC = 0xc000013a [13:40:42] Number of jobs = 16 [13:40:42] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [14:23:39] Number of jobs = 16 [14:23:39] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [14:23:59] Number of jobs = 16 [14:23:59] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [14:24:20] Number of jobs = 16 [14:24:20] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [14:25:20] Number of jobs = 16 [14:25:20] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [14:25:40] Number of jobs = 16 [14:25:40] Starting job 2,CPU time has been restored to 759.662470. [10:59:55] Number of jobs = 16 [10:59:55] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [11:03:20] Number of jobs = 16 [11:03:20] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [11:03:54] Number of jobs = 16 [11:03:54] Starting job 2,CPU time has been restored to 759.662470. [11:40:28] Number of jobs = 16 [11:40:28] Starting job 2,CPU time has been restored to 759.662470. Quit requested: Exiting [12:13:55] Number of jobs = 16 [12:13:55] Starting job 2,CPU time has been restored to 759.662470. Application exited with RC = 0xc000013a [12:18:31] Finished Job #2 [12:18:31] Starting job 3,CPU time has been restored to 995.551582. [12:18:31] Skipping Job #3 [12:18:31] Starting job 4,CPU time has been restored to 995.551582. [12:18:31] Skipping Job #4 [12:18:31] Starting job 5,CPU time has been restored to 995.551582. [12:18:31] Skipping Job #5 [12:18:31] Starting job 6,CPU time has been restored to 995.551582. [12:18:31] Skipping Job #6 [12:18:31] Starting job 7,CPU time has been restored to 995.551582. [12:18:31] Skipping Job #7 [12:18:31] Starting job 8,CPU time has been restored to 995.551582. [12:18:31] Skipping Job #8 [12:18:31] Starting job 9,CPU time has been restored to 995.551582. [12:18:31] Skipping Job #9 [12:18:31] Starting job 10,CPU time has been restored to 995.551582. [12:18:31] Skipping Job #10 [12:18:31] Starting job 11,CPU time has been restored to 995.551582. [12:18:32] Skipping Job #11 [12:18:32] Starting job 12,CPU time has been restored to 995.551582. [12:18:32] Skipping Job #12 [12:18:32] Starting job 13,CPU time has been restored to 995.551582. [12:18:32] Skipping Job #13 [12:18:32] Starting job 14,CPU time has been restored to 995.551582. [12:18:32] Skipping Job #14 [12:18:32] Starting job 15,CPU time has been restored to 995.551582. [12:18:32] Skipping Job #15 [07:59:22] Number of jobs = 16 07:59:41 (3392): called boinc_finish </stderr_txt> ]]> What does the "Quit requested" indicate (as in where does the signal originate)? Does this result log actually indicate "normal" completion, as in "valid" - even though only jobs 0 and 1 reflect normal completion, then a stutter on job #2 followed by nothing but errors and skipped? Again, it is about the anomaly. (A.k.a. "Huh...never saw that before.") Forgive me my curiosity should you find it irksome! [Edit 1 times, last edit by Former Member at Dec 11, 2011 7:44:37 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Looks to me like a dud task that would not compute/resolve, so it skipped through all the steps. This is a valid situation, hence. But, does your log look the same? Did it at least compute Job #2 where your wingman went fast forward?
On next installment of irksome curiosity, use search ;o) --//-- |
||
|
kffitzgerald
Senior Cruncher USA Joined: Jan 29, 2011 Post Count: 222 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No such thing as {irksome} curiosity - we are literally DONATING billions of hours of our computer time, power etc. Getting just a {wee} bit tired of seeing {irksome} nasty replies to concerns posted by the average user.
Ken |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Sorry you misunderstood what was intended, kffitzgerald. *We*, includes *Me*, I'm donating too! The last line of querier was:
"Forgive me my curiosity should you find it irksome!" So, when he next time felt his curiosity could irk, then use search. ![]() The answer was provided, on topic, and a supplemental question to the OP was asked, right? --//-- |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello ibsteve2u,
In some ways the error log you posted looks normal for a CEP2 work unit that is erroring out, but I have never understood it. I have simply learned to accept it. The overall structure is reasonable. The work unit has 16 tasks. The 3rd task repeatedly errors out. Finally the error log shows it going through the remaining 13 tasks in 1 second and exiting. Looking at the errors listed for the 3rd task, I see 3 types of error lines: 'Quit requested: Exiting' 'Application exited with RC = 0xc000013a ' and a 3rd type of error with no printed line - - the application just restarts and prints a new start time without giving the type of error. ![]() In any case, I think that only an application programmer with access to the source code could explain what is happening. My own guess is that something overwrote some of the data locations so the process went kablooey. Lawrence |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Looks to me like a dud task that would not compute/resolve, so it skipped through all the steps. This is a valid situation, hence. But, does your log look the same? Did it at least compute Job #2 where your wingman went fast forward? On next installment of irksome curiosity, use search ;o) --//-- I searched....not much in the way of explanation. My log was normal as I indicated earlier: Result Name: E204170_ 960_ C.30.C25H14O2S2Si.00354979.0.set1d06_ 2-- <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [00:27:43] Number of jobs = 16 [00:27:43] Starting job 0,CPU time has been restored to 0.000000. [00:29:35] Finished Job #0 [00:29:35] Starting job 1,CPU time has been restored to 109.387901. [00:35:31] Finished Job #1 [00:35:31] Starting job 2,CPU time has been restored to 461.684959. [02:17:53] Finished Job #2 [02:17:53] Starting job 3,CPU time has been restored to 6553.305608. [02:24:23] Finished Job #3 [02:24:23] Starting job 4,CPU time has been restored to 6938.284876. [02:29:22] Finished Job #4 [02:29:22] Starting job 5,CPU time has been restored to 7235.997184. [02:34:37] Finished Job #5 [02:34:37] Starting job 6,CPU time has been restored to 7548.513988. [02:39:39] Finished Job #6 [02:39:39] Starting job 7,CPU time has been restored to 7848.659912. [02:46:05] Finished Job #7 [02:46:05] Starting job 8,CPU time has been restored to 8232.734374. [02:51:04] Finished Job #8 [02:51:04] Starting job 9,CPU time has been restored to 8530.649483. [02:56:39] Finished Job #9 [02:56:39] Starting job 10,CPU time has been restored to 8863.399616. [03:07:18] Finished Job #10 [03:07:18] Starting job 11,CPU time has been restored to 9499.025691. [03:14:00] Finished Job #11 [03:14:00] Starting job 12,CPU time has been restored to 9898.731453. Application exited with RC = 0xc0000005 [03:54:12] Finished Job #12 [03:54:12] Starting job 13,CPU time has been restored to 12302.472861. [03:54:12] Skipping Job #13 [03:54:12] Starting job 14,CPU time has been restored to 12302.472861. [03:54:12] Skipping Job #14 [03:54:12] Starting job 15,CPU time has been restored to 12302.472861. [03:54:12] Skipping Job #15 03:54:20 (3912): called boinc_finish </stderr_txt> ]]> |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The tolerances for CEP2 are rather exceptional, pretty much anything being evaluated as valid [for credit of time], then the furthest advanced task being taken. This being a Zero Redundancy distribution science, the additional question being if both tasks were assigned simultaneous or yours was send out afterwards. The _2 suggests it was the 3rd copy and send as repair job. The distribution detail and issue time will surely confirm.
--//-- |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hence, I should add, my post...the one log's errors/jobs completed/time spent doesn't appear to validate or invalidate the other in a way that is detectable to me.
Guess I'll checkout a copy of the source from the SVN repository should my curiosity overwhelm me in the future. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The tolerances for CEP2 are rather exceptional, pretty much anything being evaluated as valid [for credit of time], then the furthest advanced task being taken. This being a Zero Redundancy distribution science, the additional question being if both tasks were assigned simultaneous or yours was send out afterwards. The _2 suggests it was the 3rd copy and send as repair job. The distribution detail and issue time will surely confirm. --//-- Given that the first two tasks were sent out simultaneously for all practical purposes with an indicated minimum quorum of two (2): Project Name: The Clean Energy Project - Phase 2 Created: 11/27/2011 03:17:41 Name: E204170_960_C.30.C25H14O2S2Si.00354979.0.set1d06 Minimum Quorum: 2 Replication: 2 Result Name App Version Number Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit E204170_ 960_ C.30.C25H14O2S2Si.00354979.0.set1d06_ 2-- 640 Valid 12/8/11 21:39:00 12/9/11 08:57:44 3.42 91.6 / 53.1 E204170_ 960_ C.30.C25H14O2S2Si.00354979.0.set1d06_ 1-- 640 Valid 11/28/11 22:29:29 12/8/11 21:03:38 0.21 3.4 / 12.3 E204170_ 960_ C.30.C25H14O2S2Si.00354979.0.set1d06_ 0-- 640 Error 11/28/11 21:18:21 12/8/11 22:00:52 0.00 0.0 / 0.0 And the initial/0 task's result was error with a log that might be described as "terse": Result Name: E204170_ 960_ C.30.C25H14O2S2Si.00354979.0.set1d06_ 0-- <core_client_version>6.10.58</core_client_version> I concluded this was one of the occasional jobs where redundancy is required to provide a check on the non-redundant jobs - hence, I was struck by the lack of points of agreement between...everybody. But no matter; even when I don't get an answer of the sort I expect I still learn something. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
An error result + another that did not get successfully past the first 3 jobs is the reason that your copy came out. Job 1+2 (job #0 + #1) are kind of setup steps. Job 3 (#2) is the critical one that takes hours for most participants. Possibly, the device that got _0 copy is a known unreliable device, certainly it was up front deemed fit to get a verifying unit in circulation.
--//-- |
||
|
|
![]() |