Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 144
Posts: 144   Pages: 15   [ Previous Page | 6 7 8 9 10 11 12 13 14 15 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 29257 times and has 143 replies Next Thread
Barnsley_Tatts
Senior Cruncher
Joined: Nov 3, 2005
Post Count: 281
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

I think the Validator needs a squirt of WD40 or a technical thump,

I typically have around 150-170 WUs in PV jail. Currently at 340 after the weekends out(r)age!
----------------------------------------

[Mar 5, 2024 1:14:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 986
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

Regarding tasks ending up in PV jail at present:

It isn't just happening to MCM1, though it is more obvious here than for OPNG, where the time spent in PV jail seems to be a lot shorter so folks may not notice!

I suspect that if we had server status information we would see that there is a transitioner backlog (perhaps due to the very large numbers of MCM1 tasks "reappearing" after assimilation[*1]); whenever that happens, it can have knock-on effects in other parts of the system because of the way the transitioner deals with overdue requests...

We've seen cases of high volumes of results stuck in PV jail several times in the past, usually after some sort of server issues resulting in days of no service (which would instantly cause a backlog, of course!); I guess we'll just have to wait it out if that's what is happening here...

Cheers - Al.

*1 When the validator decides a WU is valdated, it marks the WU for the attention of the assimilator and sets the timestamp the transitioner uses to order its searches to a value that tells the assimilator to ignore the WU. Once the assimilator deals with the WU it marks the state the WU should move to and sets the timestamp to the current time, thus making it visible to the transitioner again!
[Mar 5, 2024 2:29:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
cz50975
Advanced Cruncher
Joined: Dec 9, 2004
Post Count: 95
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

I see several MCM1 WUs in validation queue with Error or No reply statuses from my wingmans. There is another line in result table with name *_2 and status Waiting to be sent for more than 10 days.
That should be enough time to call another wingman for action.

https://www.worldcommunitygrid.org/contribution/workunit/516743744
https://www.worldcommunitygrid.org/contribution/workunit/517997375
[May 24, 2024 2:25:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
cz50975
Advanced Cruncher
Joined: Dec 9, 2004
Post Count: 95
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

Again, the same story. WUs from mid of June still wating for validation because system do not call another wigman to action for more than 8 days.

https://www.worldcommunitygrid.org/contribution/workunit/542575320
https://www.worldcommunitygrid.org/contribution/workunit/542664297
[Jul 1, 2024 2:01:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2217
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

Yes, same "Waiting to be sent" issue again.....
[Jul 1, 2024 3:20:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 986
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

Did something change at WCG between about 13:40 and 18:20 (UTC) on 2024-06-22?

I ask because I have a couple of [initial] tasks I returned on 2024-06-16 which had the other initial wingman miss the deadline, and one of them got the retry out (though that also missed the deadline!) but the other, later, WU didn't (so there appears to be a boundary there)[*1]...

I've listed the critical times below. Note that the retry that got sent out appears to have gone out before the initial Error was returned; that's because it would've been flagged as No Reply, and eventually the client returned an Error message (consisting of nothing but the client version) which probably returned the code for "Not started by deadline" (which WCG doesn't identify as such in its displays)

[All times UTC -- key times marked in red]

Work-unit 542028083 (MCM1_0219115_3086) created 2024-06-16 13:26:36
# Sent Due Returned/flagged Status
- ------------------- ------------------- ------------------- -----------
0 2024-06-16 13:26:43 2024-06-22 13:26:43 2024-06-22 16:41:50 Error (NSD?)
1 2024-06-16 13:26:46 2024-06-22 13:26:46 2024-06-16 16:18:57 Pending Val.
2 2024-06-22 13:35:19 2024-06-25 13:35:19 2024-06-26 21:33:24 Error (NSD?)
3 Not applicable - Waiting to be sent

Work-unit 542156289 (MCM1_0219137_3790) created 2024-06-16 13:26:36
# Sent Due Returned/flagged Status
- ------------------- ------------------- ------------------- -----------
0 2024-06-16 18:23:59 2024-06-22 18:23:59 2024-06-16 20:09:52 Pending Val.
1 2024-06-16 18:23:59 2024-06-22 18:23:59 not available No Reply
2 Not applicable - Waiting to be sent


During the period since 2024-06-22 I have observed a couple of retries going out but they were in response to "genuine" errors (failure to download, failure to start task) rather than missed deadlines. I also saw a stuck retry initiated because an initial wingman returned "User Aborted" past the point where a retry would get a shorter deadline... (Different behaviours reflect different code paths in the server, if I recall correctly[*2])

Cheers - Al.

*1 Someone with data for a lot more results than my few hundred a day may well be able to confirm or contradict this :-)

*2 I don't have time to do a repeat code-dive to verify that at present (in particular for the User Abort case, which I'd not researched previously...) -- if someone else wants to look at the [spaghetti] code to check this, I won't complain :-)

[Edited to replace the T between date and time (as produced by my scripts) with a space :-)]
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Jul 1, 2024 10:51:53 PM]
[Jul 1, 2024 8:16:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7697
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

I currently have 403 "pending validation." In checking the oldest 20 or so I noted that any work unit which had reported an error, no reply , user aborted type of non completion had generated another work unit, but all of them were in the "waiting to be sent" category. If there was only one other work unit than mine, it has been sent and shows "in progress."
I may be jumping to a conclusion with only limited data, but it appears any condition showing a non-completion will generate another work unit, but fail to send it out for processing.
Without knowing the ins-and-outs of the software it is impossible to determine exactly what is generating the " waiting to be sent" condition.

Edit: One other curious note. I have only seen very few retries lately, On June 30, I had two out of 324 units completed..

Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
----------------------------------------
[Edit 2 times, last edit by Sgt.Joe at Jul 1, 2024 8:58:45 PM]
[Jul 1, 2024 8:41:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Vester
Senior Cruncher
USA
Joined: Nov 18, 2004
Post Count: 325
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

Same thing for me, Sgt.Joe. Half of everything I have completed in the past week is awaiting validation. (About 450 tasks pending validation or waiting to be sent.)
----------------------------------------

[Jul 1, 2024 9:47:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
roundup
Veteran Cruncher
Switzerland
Joined: Jul 25, 2006
Post Count: 838
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

1182 Work Units in 'Pending Validation'...
[Jul 2, 2024 1:58:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 986
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

Some of the more ancient Waiting to be sent tasks seem to have got freed up this morning -- since about 0830 UTC I've received over 40 such retries (out of about 120 total tasks durnig that interval), and a small number of tasks where I was waiting for a retry to unblock are now showing the retry as In Progress...

I appear to have been getting a mix of [delayed] retries and new work for about 6 hours now -- I wonder how long that'll continue :-)

Cheers - Al.

P.S. If this is due to something the WCG tech team has done, my thanks! But if it has happened without intervention, I wonder what might have caused it -- it might be useful to know at some future point if [as seems possible] this issue happens again :-)
[Jul 3, 2024 1:40:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 144   Pages: 15   [ Previous Page | 6 7 8 9 10 11 12 13 14 15 | Next Page ]
[ Jump to Last Post ]
Post new Thread