Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 6
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2444 times and has 5 replies Next Thread
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
FAH2 running job trickling regularly then server aborted (202), but not due.

The story is more than puzzling.. Received on the 12th, 3:52:53AM and immediately started by client (yes, these jobs like to elbow themselves into 4 ways project queue to front execution). This last night the job got server aborted, code 202 at 3:42AM my time per the event log (system time is internet synchronized every so many hours), yet Results page indicates 3:39:56AM?

FAH2_ 000269_ avx387477-ls_ 000033_ 0003_ 024_ 1-- 3113135 Valid 9/12/16 03:52:53 9/13/16 03:39:56 21.60 / 0.00 310.8 / 272.0 < Mine
FAH2_ 000269_ avx387477-ls_ 000033_ 0003_ 024_ 0-- Microsoft Windows 7 x64 Edition, Service Pack 1, (06.01.7601.00) - No Reply 9/8/16 09:46:57 9/12/16 03:51:38 0.00 0.0 / 0.0

More of the puzzle is the alarming "unrecoverable error", yet the client logging code 202 (server aborted) and just prior says the task is no longer usable just after a trickle, and gets a Valid rating on the Result pages. (Implies the last trickle was bad or the trickle validator had other thoughts on the progress). Then the server logging 21.6 hours CPU time, but no elapsed, though the client log indicates 23:08:11

World Community Grid 7.14 fahb FAH2_000269_avx387477-ls_000033_0003_024_1 23:08:11 (21:36:13) 9/13/2016 5:42:50 AM 9/13/2016 5:46:50 AM 93,38 Aborted (202)

Event Log:
1714 World Community Grid 9/13/2016 5:41:48 AM [sched_op] Starting scheduler request
1715 World Community Grid 9/13/2016 5:41:48 AM [trickle] read trickle file projects/www.worldcommunitygrid.org/trickle_up_FAH2_000269_avx387477-ls_000033_0003_024_1_1473738106.xml
1716 World Community Grid 9/13/2016 5:41:48 AM Sending scheduler request: To send trickle-up message.
1717 World Community Grid 9/13/2016 5:41:48 AM Not requesting tasks: don't need (job cache full)
1718 World Community Grid 9/13/2016 5:41:48 AM [sched_op] CPU work request: 0.00 seconds; 0.00 devices
1719 World Community Grid 9/13/2016 5:41:53 AM Scheduler request completed
1720 World Community Grid 9/13/2016 5:41:53 AM [sched_op] Server version 701
1721 World Community Grid 9/13/2016 5:41:53 AM Result FAH2_000269_avx387477-ls_000033_0003_024_1 is no longer usable
1722 World Community Grid 9/13/2016 5:41:53 AM Project requested delay of 121 seconds
1723 World Community Grid 9/13/2016 5:41:53 AM [sched_op] Deferring communication for 00:01:11
1724 World Community Grid 9/13/2016 5:41:53 AM [sched_op] Reason: Unrecoverable error for task FAH2_000269_avx387477-ls_000033_0003_024_1
1725 World Community Grid 9/13/2016 5:41:53 AM [sched_op] Deferring communication for 00:02:01
1726 World Community Grid 9/13/2016 5:41:53 AM [sched_op] Reason: requested by project
1727 World Community Grid 9/13/2016 5:41:54 AM Finished upload of FAH2_000269_avx387477-ls_000033_0003_024_1_r13989270_8
1728 World Community Grid 9/13/2016 5:41:54 AM Started upload of FAH2_000269_avx387477-ls_000033_0003_024_1_r13989270_18
1729 World Community Grid 9/13/2016 5:41:55 AM Computation for task FAH2_000269_avx387477-ls_000033_0003_024_1 finished
1731 World Community Grid 9/13/2016 5:41:59 AM Finished upload of FAH2_000269_avx387477-ls_000033_0003_024_1_r13989270_18
1741 World Community Grid 9/13/2016 5:43:59 AM [sched_op] handle_scheduler_reply(): got ack for task FAH2_000269_avx387477-ls_000033_0003_024_1

Looking in the message log, it's but for the header drawing a complete blank
Result Log

Result Name: FAH2_ 000269_ avx387477-ls_ 000033_ 0003_ 024_ 1--

Close

Return to Top
What's on with all this? There's zero indication why the job was being aborted just into it's 26th hour on the client. What's up with the time difference, in fact I can't remember having seen any result throw up so many questions. Am I in La La Land?

The original got BTW a No Reply, well before the 4 days deadline... could have been crunching off-line, be it we all know the disastrous effect that has on getting anything on this project in good order through the validator. Are the good trickle parts of invalids used for seeding the follow on task, or are they redone in full... not recollecting to have seen a full distribution to determine if an additional copy is send out or ending there and a new seed is generated like this sample

FAH2_000124_avx17558cs-ls_000034_0000_026_wcgfahb00070000_0 11:26:37 (10:54:44) 95,36 70,526 04:46:57 9/12/2016 9:55:26 PM 03d,12:27:58 Running [49] 00:00:15 128.88 MB 61.00 MB.
[Sep 13, 2016 7:40:35 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7687
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: FAH2 running job trickling regularly then server aborted (202), but not due.

That is a puzzler. d oh
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Sep 13, 2016 11:09:38 AM]   Link   Report threatening or abusive post: please login first  Go to top 
BobCat13
Senior Cruncher
Joined: Oct 29, 2005
Post Count: 295
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: FAH2 running job trickling regularly then server aborted (202), but not due.

Was that the only task that got the no longer usable?

Starting July 18, twice I have seen the server dump every task (including running tasks) on a machine displaying that message and then list them as Detached on the results page. Both times occurred during the middle of the night, so I know no one did anything with the client on that machine.
[Sep 13, 2016 2:43:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: FAH2 running job trickling regularly then server aborted (202), but not due.

Yup, lone one on 8 cores, with emphasis on the blank result log and it still being declared valid, with zero elapsed, but 21. something hours runtime. The other tasks went on unperturbed, zika, oet ugm, one other parallel world FAH2. It's only because BOINCTasks so excellently logs the exit codes of results, and the entry being highlighted in the history ("if exit code is non-zero, mark it red" rule), else I'd not ever seen it.
[Sep 13, 2016 3:18:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
vepaul
Senior Cruncher
Belgium
Joined: Nov 17, 2004
Post Count: 261
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: FAH2 running job trickling regularly then server aborted (202), but not due.

Hello,

What is the difference between FAH2 and FAHV, Vina ?
Thanks for an explanation.
[May 27, 2017 2:01:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
BobCat13
Senior Cruncher
Joined: Oct 29, 2005
Post Count: 295
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: FAH2 running job trickling regularly then server aborted (202), but not due.

EDIT: The server finally aborted the _1. Guess I was being too impatient.

Not completely on topic to the original, but ever so loosely associated.

My machine received a _1 as the _0 had not completed by the 4 day deadline. The _0 has now completed and my machine has not started the _1 yet and has contacted the server several times since the _0 was marked Valid. My _1 should be aborted by the server, correct? It has not been.


Project Name: FightAIDS@Home - Phase 2
Created: 06/01/2017 18:31:25
Name: FAH2_000952_zinc01801607_000004_0007_006
Minimum Quorum: 1
Replication: 1

FAH2_ 000952_ zinc01801607_ 000004_ 0007_ 006_ 1-- Microsoft Windows XP Professional x86 Edition, Service Pack 3, (05.01.2600.00) - In Progress 6/7/17 04:09:31 6/11/17 04:09:31 0.00 0.0 / 0.0

FAH2_ 000952_ zinc01801607_ 000004_ 0007_ 006_ 0-- Microsoft Windows 7 Home Premium x64 Edition, Service Pack 1, (06.01.7601.00) 714 Valid 6/3/17 04:09:20 6/7/17 08:52:26 17.89 357.2 / 357.2
----------------------------------------
[Edit 1 times, last edit by BobCat13 at Jun 7, 2017 6:46:53 PM]
[Jun 7, 2017 2:11:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread