Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 144
Posts: 144   Pages: 15   [ Previous Page | 6 7 8 9 10 11 12 13 14 15 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 28375 times and has 143 replies Next Thread
Speedy51
Veteran Cruncher
New Zealand
Joined: Nov 4, 2005
Post Count: 1293
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

Speedy51, pls read again my original input.

yes I have reread your original post. I was purely pointing out what I said in my above post
----------------------------------------

[Jul 26, 2024 9:40:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7670
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

The last "Waiting to be sent" I had from April 28 was finally sent on August 29. It was crunched and became valid on August 30.


MCM1_0216199_5325_1 Linux 3.13.0-143-generic Valid 2024-08-29 20:12:47 UTC 2024-08-30 05:47:08 UTC 3.78 / 3.78 75.1 / 75.3

MCM1_0216199_5325_2 Linux Linuxmint LMDE 4 (debbie) [4.19.0-8-amd64|libc 2.28 (Debian GLIBC 2.28-10)] Valid 2024-04-28 09:35:11 UTC 2024-04-28 17:48:45 UTC 3.12 / 3.12

Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Aug 31, 2024 2:46:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 971
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

The last "Waiting to be sent" I had from April 28 was finally sent on August 29. It was crunched and became valid on August 30.


MCM1_0216199_5325_1 Linux 3.13.0-143-generic Valid 2024-08-29 20:12:47 UTC 2024-08-30 05:47:08 UTC 3.78 / 3.78 75.1 / 75.3

MCM1_0216199_5325_2 Linux Linuxmint LMDE 4 (debbie) [4.19.0-8-amd64|libc 2.28 (Debian GLIBC 2.28-10)] Valid 2024-04-28 09:35:11 UTC 2024-04-28 17:48:45 UTC 3.12 / 3.12

Cheers
Finally!!!

I find it interesting that it seems to have needed that outage on the 29th to sort this out... It looks as if the stalled task was picked up almost as soon as contact with the scheduler resumed :-) -- coincidence???

It would be interesting to know why the system was off for those 5 hours, but I doubt we'll ever find out...

Cheers - Al.
[Aug 31, 2024 10:39:38 AM]   Link   Report threatening or abusive post: please login first  Go to top 
cz50975
Advanced Cruncher
Joined: Dec 9, 2004
Post Count: 95
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

There is problem with some MCM validators from approximately 2024-11-17 14:51:31 UTC

Here is example
https://www.worldcommunitygrid.org/contribution/workunit/628669472

MCM1_0227764_9064_0 Pending Validation 2024-11-12 17:40:16 UTC 2024-11-16 20:38:46 UTC 4.84 / 4.86 83.1 / 0
MCM1_0227764_9064_1 Pending Validation 2024-11-12 17:38:32 UTC 2024-11-17 14:51:31 UTC 1.89 / 2.12 84.3 / 0

Some WUs can pass validation in seconds other waiting hours like example above for more than 15 hours.

Together with "Waiting to be sent" error rapidly increased my validation queue.

====================================

WU finally validated.
----------------------------------------
[Edit 1 times, last edit by cz50975 at Nov 18, 2024 1:59:30 PM]
[Nov 18, 2024 6:23:12 AM]   Link   Report threatening or abusive post: please login first  Go to top 
cz50975
Advanced Cruncher
Joined: Dec 9, 2004
Post Count: 95
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

Still have bulk of MCM WUs received in DEC 2024, returned early JAN 2025, affected by "returned in 2018" problem and now waiting in validation queue because of "Waiting to be sent" issue.

here are examples from different devices
https://www.worldcommunitygrid.org/contribution/workunit/642156202
https://www.worldcommunitygrid.org/contribution/workunit/642770207
https://www.worldcommunitygrid.org/contribution/workunit/642003685
https://www.worldcommunitygrid.org/contribution/workunit/643057568
[Jan 20, 2025 7:42:44 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2167
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

Still have bulk of MCM WUs received in DEC 2024, returned early JAN 2025, affected by "returned in 2018" problem and now waiting in validation queue because of "Waiting to be sent" issue.

Lots of "Waiting to be sent" here, too: 197 tasks from December 2024; however, none "returned in 2018".
All of these 197 tasks have exactly one wingman that didn't return their task in time.

Adri
[Jan 20, 2025 1:22:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 971
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

Still have bulk of MCM WUs received in DEC 2024, returned early JAN 2025, affected by "returned in 2018" problem and now waiting in validation queue because of "Waiting to be sent" issue.

Lots of "Waiting to be sent" here, too: 197 tasks from December 2024; however, none "returned in 2018".
All of these 197 tasks have exactly one wingman that didn't return their task in time.

Adri
Yup, and this will continue to get worse until there is another one of those periods where there seem to be nothing but retries being sent out (along with the dreaded "Tasks are committed to other platforms" message).

A build-up can happen without there having been some sort of system outage; that seems to happen regularly based on the interval (deadline related?) since the previous blockage cleared. The algorithm they are using to decide on new work control seems to end up putting out enough work that some of the missed deadline tasks collected after the previous unblocking end up with delayed retries anyway (if enough miss deadline together, as often seems to happen), so another blockage starts...

For what it's worth, I'm getting a [very small] number of retries to process, but most of them were more or less instant turn-around cases and the most delayed one this time round was held up for just over a day -- not really surprising, as the older the work unit, the longer it seems to have to wait to get a retry :-(

Ah, well, either something they've already built into their automation will clear some of it out, or their Tech Team person will eventually get a chance to look into it.

Cheers - Al.

P.S. When I raised this in our "Project status" thread earlier in the week I also mentioned the 1-person Tech Team and the likelihood that he has multiple equally urgent tasks pending -- though it may not make us very happy, I think we need to bear that in mind :-)

[Edited the P.S. (regarding tasks)]
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Jan 20, 2025 4:20:28 PM]
[Jan 20, 2025 2:12:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7670
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

Yes, quite the buildup of "Pending Validation." My oldest one I returned on Dec 2, 2024. There was a "No Reply" workunit and the third one is in the "Waiting to be sent " status.

Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
----------------------------------------
[Edit 1 times, last edit by Sgt.Joe at Jan 20, 2025 10:25:59 PM]
[Jan 20, 2025 10:25:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
cz50975
Advanced Cruncher
Joined: Dec 9, 2004
Post Count: 95
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

There is again problem with MCM validations. Some WUs can pass validation in seconds other waiting hours like example below for more than 20 hours.

https://www.worldcommunitygrid.org/contribution/workunit/655985720
----------------------------------------
[Edit 1 times, last edit by cz50975 at Jan 31, 2025 12:50:00 PM]
[Jan 31, 2025 10:24:54 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 971
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

It isn't really surprising that the validators might struggle when very high numbers of results returned are retries or second results (as will be the case when there's a mass clear-out of delayed retries, no matter how that is achieved...)

The problem could be resolved by temporarily restarting the validators but running twice as many, each one looking at a smaller subset of work-units! However, the long-term solution would be to avoid the build-up of waiting retries in the first place. I don't know whether their existing control over the amount of new work generated actually looks at the number of retries waiting as well as the amount of work currently out in the field - if it doesn't, it probably should [if possible] :-)

However, there are two other aspects that are beyond the control of WCG; one is things like hardware outages (planned or otherwise), the other is the large number of results that come back No Reply or get killed off by the client (Not started by deadline) because users have accumulated too much work for one reason or another[*1]. In this case, we are still seeing the knock-on effects of the data centre outage :-( but some delayed retries are also coming back No Reply...

It would be nice to see the work cycle for MCM1 settle down, but (given the number of other calls on the time of the Tech Team member) I suspect we may be stuck with these cyclic events until MAM1 comes on stream (which, with luck, will somewhat reduce the general need for new MCM1 tasks!).

Cheers - Al.

*1 -- there are likely to be late returns for reasons other than over-large buffers; all it needs is a serious mis-estimate of expected run time for some project (here or elsewhere) to fill up smaller buffers with unneeded work (ask Einstein@home and MilkyWay@home how I know that one!)
[Jan 31, 2025 3:47:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 144   Pages: 15   [ Previous Page | 6 7 8 9 10 11 12 13 14 15 | Next Page ]
[ Jump to Last Post ]
Post new Thread