Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 6
|
![]() |
Author |
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
The story is more than puzzling.. Received on the 12th, 3:52:53AM and immediately started by client (yes, these jobs like to elbow themselves into 4 ways project queue to front execution). This last night the job got server aborted, code 202 at 3:42AM my time per the event log (system time is internet synchronized every so many hours), yet Results page indicates 3:39:56AM?
FAH2_ 000269_ avx387477-ls_ 000033_ 0003_ 024_ 1-- 3113135 Valid 9/12/16 03:52:53 9/13/16 03:39:56 21.60 / 0.00 310.8 / 272.0 < Mine FAH2_ 000269_ avx387477-ls_ 000033_ 0003_ 024_ 0-- Microsoft Windows 7 x64 Edition, Service Pack 1, (06.01.7601.00) - No Reply 9/8/16 09:46:57 9/12/16 03:51:38 0.00 0.0 / 0.0 More of the puzzle is the alarming "unrecoverable error", yet the client logging code 202 (server aborted) and just prior says the task is no longer usable just after a trickle, and gets a Valid rating on the Result pages. (Implies the last trickle was bad or the trickle validator had other thoughts on the progress). Then the server logging 21.6 hours CPU time, but no elapsed, though the client log indicates 23:08:11 World Community Grid 7.14 fahb FAH2_000269_avx387477-ls_000033_0003_024_1 23:08:11 (21:36:13) 9/13/2016 5:42:50 AM 9/13/2016 5:46:50 AM 93,38 Aborted (202) Event Log: 1714 World Community Grid 9/13/2016 5:41:48 AM [sched_op] Starting scheduler request 1715 World Community Grid 9/13/2016 5:41:48 AM [trickle] read trickle file projects/www.worldcommunitygrid.org/trickle_up_FAH2_000269_avx387477-ls_000033_0003_024_1_1473738106.xml 1716 World Community Grid 9/13/2016 5:41:48 AM Sending scheduler request: To send trickle-up message. 1717 World Community Grid 9/13/2016 5:41:48 AM Not requesting tasks: don't need (job cache full) 1718 World Community Grid 9/13/2016 5:41:48 AM [sched_op] CPU work request: 0.00 seconds; 0.00 devices 1719 World Community Grid 9/13/2016 5:41:53 AM Scheduler request completed 1720 World Community Grid 9/13/2016 5:41:53 AM [sched_op] Server version 701 1721 World Community Grid 9/13/2016 5:41:53 AM Result FAH2_000269_avx387477-ls_000033_0003_024_1 is no longer usable 1722 World Community Grid 9/13/2016 5:41:53 AM Project requested delay of 121 seconds 1723 World Community Grid 9/13/2016 5:41:53 AM [sched_op] Deferring communication for 00:01:11 1724 World Community Grid 9/13/2016 5:41:53 AM [sched_op] Reason: Unrecoverable error for task FAH2_000269_avx387477-ls_000033_0003_024_1 1725 World Community Grid 9/13/2016 5:41:53 AM [sched_op] Deferring communication for 00:02:01 1726 World Community Grid 9/13/2016 5:41:53 AM [sched_op] Reason: requested by project 1727 World Community Grid 9/13/2016 5:41:54 AM Finished upload of FAH2_000269_avx387477-ls_000033_0003_024_1_r13989270_8 1728 World Community Grid 9/13/2016 5:41:54 AM Started upload of FAH2_000269_avx387477-ls_000033_0003_024_1_r13989270_18 1729 World Community Grid 9/13/2016 5:41:55 AM Computation for task FAH2_000269_avx387477-ls_000033_0003_024_1 finished 1731 World Community Grid 9/13/2016 5:41:59 AM Finished upload of FAH2_000269_avx387477-ls_000033_0003_024_1_r13989270_18 1741 World Community Grid 9/13/2016 5:43:59 AM [sched_op] handle_scheduler_reply(): got ack for task FAH2_000269_avx387477-ls_000033_0003_024_1 Looking in the message log, it's but for the header drawing a complete blank Result Log What's on with all this? There's zero indication why the job was being aborted just into it's 26th hour on the client. What's up with the time difference, in fact I can't remember having seen any result throw up so many questions. Am I in La La Land?Result Name: FAH2_ 000269_ avx387477-ls_ 000033_ 0003_ 024_ 1-- Close Return to Top The original got BTW a No Reply, well before the 4 days deadline... could have been crunching off-line, be it we all know the disastrous effect that has on getting anything on this project in good order through the validator. Are the good trickle parts of invalids used for seeding the follow on task, or are they redone in full... not recollecting to have seen a full distribution to determine if an additional copy is send out or ending there and a new seed is generated like this sample FAH2_000124_avx17558cs-ls_000034_0000_026_wcgfahb00070000_0 11:26:37 (10:54:44) 95,36 70,526 04:46:57 9/12/2016 9:55:26 PM 03d,12:27:58 Running [49] 00:00:15 128.88 MB 61.00 MB. |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7687 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
That is a puzzler.
----------------------------------------![]() Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
BobCat13
Senior Cruncher Joined: Oct 29, 2005 Post Count: 295 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Was that the only task that got the no longer usable?
Starting July 18, twice I have seen the server dump every task (including running tasks) on a machine displaying that message and then list them as Detached on the results page. Both times occurred during the middle of the night, so I know no one did anything with the client on that machine. |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Yup, lone one on 8 cores, with emphasis on the blank result log and it still being declared valid, with zero elapsed, but 21. something hours runtime. The other tasks went on unperturbed, zika, oet ugm, one other parallel world FAH2. It's only because BOINCTasks so excellently logs the exit codes of results, and the entry being highlighted in the history ("if exit code is non-zero, mark it red" rule), else I'd not ever seen it.
|
||
|
vepaul
Senior Cruncher Belgium Joined: Nov 17, 2004 Post Count: 261 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hello,
What is the difference between FAH2 and FAHV, Vina ? Thanks for an explanation. |
||
|
BobCat13
Senior Cruncher Joined: Oct 29, 2005 Post Count: 295 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
EDIT: The server finally aborted the _1. Guess I was being too impatient.
----------------------------------------Not completely on topic to the original, but ever so loosely associated. My machine received a _1 as the _0 had not completed by the 4 day deadline. The _0 has now completed and my machine has not started the _1 yet and has contacted the server several times since the _0 was marked Valid. My _1 should be aborted by the server, correct? It has not been. Project Name: FightAIDS@Home - Phase 2 Created: 06/01/2017 18:31:25 Name: FAH2_000952_zinc01801607_000004_0007_006 Minimum Quorum: 1 Replication: 1 FAH2_ 000952_ zinc01801607_ 000004_ 0007_ 006_ 1-- Microsoft Windows XP Professional x86 Edition, Service Pack 3, (05.01.2600.00) - In Progress 6/7/17 04:09:31 6/11/17 04:09:31 0.00 0.0 / 0.0 FAH2_ 000952_ zinc01801607_ 000004_ 0007_ 006_ 0-- Microsoft Windows 7 Home Premium x64 Edition, Service Pack 1, (06.01.7601.00) 714 Valid 6/3/17 04:09:20 6/7/17 08:52:26 17.89 357.2 / 357.2 [Edit 1 times, last edit by BobCat13 at Jun 7, 2017 6:46:53 PM] |
||
|
|
![]() |