Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 65
Posts: 65   Pages: 7   [ Previous Page | 1 2 3 4 5 6 7 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 8383 times and has 64 replies Next Thread
Vester
Senior Cruncher
USA
Joined: Nov 18, 2004
Post Count: 325
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Validator not running...or something else?

I have some that are pending validation although someone else has completed the same WU. In all pending cases with two completions, there is a large disparity in completion times.
----------------------------------------

[Sep 1, 2022 8:35:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2153
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Validator not running...or something else?

I have some that are pending validation although someone else has completed the same WU.
Same here, having many ARP1-tasks Pending Validation while their wingmen do, too. None are Pending Verification.
In all pending cases with two completions, there is a large disparity in completion times.

I'd say not in all cases. Examples from my results, selected from the output of the command wcgstats -wrrr -sP -aARP1:

From page 1/6:
<9> ARP1_0009008_133_0 Linux Ubuntu P.Val 2022-08-28T06:42:52 2022-09-01T01:26:19 10.18/10.21
<9> * ARP1_0009008_133_1 Fedora Linux P.Val 2022-08-28T10:04:42 2022-09-01T07:42:50 11.12/11.16

<10> * ARP1_0013492_133_0 Fedora Linux P.Val 2022-08-28T08:16:20 2022-09-01T07:00:57 11.55/11.60
<10> ARP1_0013492_133_1 Linux Ubuntu P.Val 2022-08-28T06:42:52 2022-09-01T17:13:54 10.59/10.61

From page 4/6:
<15> * ARP1_0007020_130_0 Fedora Linux P.Val 2022-08-26T20:21:08 2022-08-30T12:26:34 9.30/9.34
<15> ARP1_0007020_130_1 Linux Debian P.Val 2022-08-26T19:24:18 2022-08-30T12:00:57 8.60/8.64

From page 5/6:
<6> * ARP1_0032053_129_0 Fedora Linux P.Val 2022-08-26T13:18:07 2022-08-30T02:16:55 9.13/9.19
<6> ARP1_0032053_129_1 Linux Gentoo P.Val 2022-08-26T13:06:30 2022-08-27T04:20:15 8.67/8.78

<12> ARP1_0030897_129_0 Arch Linux P.Val 2022-08-26T04:57:12 2022-08-29T12:47:31 7.69/7.70
<12> * ARP1_0030897_129_1 Fedora Linux P.Val 2022-08-26T05:57:08 2022-08-29T13:52:29 8.38/8.44

Anyhow, you get the idea … biggrin
[Sep 1, 2022 9:42:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 945
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Validator not running...or something else?

A recent addition to my set of BOINC data collection scripts looks for validation issues; a sample item from it is shown below.

ARP1_0014435_129 (WU 155120082) was created 2022-08-25T14:27:14+0000
It has multiple Pending Validation results and none In Progress
First request 2022-08-26T13:24:21+0000;
last return 2022-08-27T20:36:13+0000

The script produces a summary thus (example is for 26th August returns):

Units validated:       6
Units stuck at PVal: 4
Units In Progress: 1

Note that "Units In Progress" refers to items where my result is at Pending Validation but there's at least one other result marked as In Progress...

The total since 26th August (excluding work returned today) is as below (correct as at about 21:30 UTC on 1st September.) :

Units validated:       20
Units stuck at PVal: 23
Units In Progress: 7

It appears that (as at the time of posting) there hasn't been a successful validation for any work unit created after about 13:45 UTC on 25th. All the validated units were older than that (some of them considerably so because of the number of No Reply and Not Started by Deadline tasks ARP1 keeps getting...)

I wonder if there will be a sudden flushing out of one day's worth of these when they've been stalled for 6 days (the same interval as the deadline) -- that is used by the transitioner as an inactivity retry time so they may get a kick then. Whether it'll be sufficient to get them past the blockage is another matter :-(

Cheers - Al.
----------------------------------------
[Edit 2 times, last edit by alanb1951 at Sep 1, 2022 10:29:16 PM]
[Sep 1, 2022 10:19:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
MJH333
Senior Cruncher
England
Joined: Apr 3, 2021
Post Count: 266
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Validator not running...or something else?

Al,
I’ve also got some where my result is Pending Validation and one or more results have errored out (or no reply) and the next wingman is “Waiting to be sent”.
See e.g. https://www.worldcommunitygrid.org/contribution/workunit/155119949
Do you think that is a related issue?
Cheers,
Mark
[Sep 2, 2022 9:22:28 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 945
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Validator not running...or something else?

Al,
I’ve also got some where my result is Pending Validation and one or more results have errored out (or no reply) and the next wingman is “Waiting to be sent”.
See e.g. https://www.worldcommunitygrid.org/contribution/workunit/155119949
Do you think that is a related issue?
Cheers,
Mark
Mark,

I had the precursor to one of those show up overnight - my success return and one failure return and no retry! And yes, it is related, but tied to the transitioner rather than the validator.

This is not really the place for a [long] "how the transitioner works" piece, so I'll just cite an example of what can happen...

When MilkyWay@home had their disk crash and subsequent excess work unit generation it was common to see work units needing retries that either had "Waiting to be sent" or didn't seem to have a retry readied at all.. What was happening was that there was such a long queue of items waiting for transitioner access that requests to generate a retry and/or submit the retry to the feeder weren't being seen in the usual timely fashion.

The transitioner has a built-in defence mechanism it brings into play at the end of looking at a request; if the new "transitioner time" would be in the past (and yes, if there's a backlog of any type it can happen!) it alters that time to push it into the future; unfortunately, the further in the past the request would have been, the further it shifts it into the future! So if there's a genuine backlog, processed items could be pushed as far as a day into the future (which added to delays when MilkyWay was in difficulties ...)

As far as I know, MilkyWay only run one transitioner, but i suspect WCG run multiple transitioners -- if any of those have crashed and failed to restart, a certain portion of work will not be able to advance, so that could be a reason for problems; otherwise, they may need more transitioners :-)

So it looks as if WCG may need some down time to either clear a transitioner backlog or reconfigure to run more transitioners; turning off various work unit generators instead might help, but it would only be a temporary fix...

And let's just hope that there isn't an over-loaded database and//or file-store access problems at the back of this all - if there is, we may be in for work-unit rationing... :-(

Cheers - Al.

[Edit; to add remark about database/filestore overload...]
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Sep 2, 2022 1:24:14 PM]
[Sep 2, 2022 12:01:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
MJH333
Senior Cruncher
England
Joined: Apr 3, 2021
Post Count: 266
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Validator not running...or something else?

Al,
Many thanks for the explanation.
Cheers,
Mark
[Sep 2, 2022 5:05:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
MyrCu
Cruncher
Joined: Apr 9, 2020
Post Count: 43
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Validator not running...or something else?

I received a few ARP in the last days, together 32. All I sent back between 27. to 29. August are still "pending". Only one is "Valid" (1. Sept, 9:10), two other are pending, wich were sent back on 1. or 2. September.
In the last one or two days i didn't receive any more ARPs.
[Sep 2, 2022 8:15:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
sptrog1
Master Cruncher
Joined: Dec 12, 2017
Post Count: 1574
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Validator not running...or something else?

Do I understand that successful work is being held up by validation issues and it is not just a matter of credit being issued?
[Sep 2, 2022 9:21:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
D_S_Spence
Advanced Cruncher
Canada
Joined: Jan 5, 2017
Post Count: 107
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Validator not running...or something else?

I don't know if another example helps anything, but I have one here:
ARP1_0021214_128
https://www.worldcommunitygrid.org/contribution/workunit/155155979
[Sep 2, 2022 9:28:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
geophi
Advanced Cruncher
U.S.
Joined: Sep 3, 2007
Post Count: 102
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Validator not running...or something else?

Do I understand that successful work is being held up by validation issues and it is not just a matter of credit being issued?

Yes. You understand correctly. I have completed about 20 ARP task where two or more computers have finished the tasks in each work unit and all are pending validation.
[Sep 2, 2022 10:31:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 65   Pages: 7   [ Previous Page | 1 2 3 4 5 6 7 | Next Page ]
[ Jump to Last Post ]
Post new Thread