Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 144
Posts: 144   Pages: 15   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 27741 times and has 143 replies Next Thread
wildhagen
Veteran Cruncher
The Netherlands
Joined: Jun 5, 2009
Post Count: 832
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

Validation seems to be at least partially running now.

Earlier today I had 45 pages of WU's awaiting validation, that is back down to 'just' 37 pages now, and still dropping.
[Feb 8, 2023 3:28:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1951
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

It seems there is overall still and issue with nobody properly monitoring the whole system, including a stuck validator.
Since yesterday, the number of my PV jail inmates has increased by about 50% and my overall stats, both on the WCG contribution page as well as external stats has cut at least in half.
I wonder for how long there's gonna be just crickets from Krembil... sad

Ralf
----------------------------------------

[Feb 8, 2023 4:39:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7664
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

I think it may still be stuck. If there is more than one, at least one is stuck. The mid day update showed 221,000 for MCM which is about 1/2 of normal. Personally, I still show 560 in pending validation status. Still about twice normal.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Feb 8, 2023 6:54:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1951
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

Validation seems to be at least partially running now.

Earlier today I had 45 pages of WU's awaiting validation, that is back down to 'just' 37 pages now, and still dropping.
It's not that there are no WUs validated at all, the problem is that there are more and more WUs which are getting stuck in PV jail.
Some are what bfmores also complained about, WUs of folks that are hoarding WUs "because we are running out of work!" and selfishly increase their buffers to rather unreasonable amounts, not realizing how this backfires on the system as a whole. Those then time out and result in resends (_2, possibly _3 and _4, in case of MCM), which then might take another 3 days or so to be returned (or not).

The other issue is that there seems to be more and more WUs where both of the results of the original WU have been successfully returned (in time) and do not validate within a reasonable amount of time.

Either way, this is a situation that needs to be looked at, and while Cyclops seems to have been lurking since the number of posts about this increased, there is not even a post acknowledging that problem... sad


Ralf
----------------------------------------

[Feb 8, 2023 7:36:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Cyclops
Senior Cruncher
Joined: Jun 13, 2022
Post Count: 295
Status: Offline
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

Hi all, the tech team and I are aware of the issue and are monitoring the constantly increasing pool of workunits that are stuck pending validation for MCM1. We are looking into ways to mitigate the effect that volunteers who hoard units they are too slow to actually return in time are having on the rest of the community. Once a solution has been agreed on and put into action, we will share it here. Sorry for the confusion.
----------------------------------------
[Edit 1 times, last edit by Cyclops at Feb 8, 2023 8:43:57 PM]
[Feb 8, 2023 8:37:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1951
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

Hi all, the tech team and I are aware of the issue and are monitoring the constantly increasing pool of workunits that are stuck pending validation for MCM1. We are looking into ways to mitigate the effect that volunteers who hoard units they are too slow to actually return in time are having on the rest of the community. Once a solution has been agreed on and put into action, we will share it here. Sorry for the confusion.
Well, I always wonder why it takes so long to even get an acknowledgement of a problem... sad

Anyway, just to be clear, there seem to be at least two different issues here, which might not necessarily be connected.

The expiring, hoarded WUs and subsequent resends are just one of those issues, another one is that apparently randomly successfully returned WUs (with and without resends) are being stuck for no apparent reasons.

Ralf
----------------------------------------

[Feb 8, 2023 9:13:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
cz50975
Advanced Cruncher
Joined: Dec 9, 2004
Post Count: 95
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

there is problem with some MCM validators from approximately 2023-02-06 21:30:33 UTC

Here is example - WU MCM1_0196018_0172
https://www.worldcommunitygrid.org/contribution/workunit/260080169

MCM1_0196018_0172_0 Win11 PenVal 2023-02-05 07:42:05 UTC 2023-02-06 21:30:33 UTC 2.67 / 2.84 93.9 / 0
MCM1_0196018_0172_1 Win10 PenVal 2023-02-05 07:42:12 UTC 2023-02-05 14:22:48 UTC 1.32 / 1.34 75.2 / 0

Now majority of uploaded MCM WUs waiting in queue for validation as example above.

more than 48 hours in queue with no progress
this is not just a delay in processing, it's stuck

I have bulk of such examples, but that is the oldest
[Feb 8, 2023 9:44:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 953
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

Hi all, the tech team and I are aware of the issue and are monitoring the constantly increasing pool of workunits that are stuck pending validation for MCM1. We are looking into ways to mitigate the effect that volunteers who hoard units they are too slow to actually return in time are having on the rest of the community. Once a solution has been agreed on and put into action, we will share it here. Sorry for the confusion.
Cyclops,

It would be interesting to know how many validators are being run for MCM1 -- I suspect it needs more than one, but...

As for mitigations (other than more validators!) -- the obvious one is to put a fairly tight total jobs ceiling on issued tasks per host -- even the fastest current CPUs can't get through several MCM1 tasks per CPU per hour [at present] and big, powerful machines will tend to be permanently connected to the Internet. Unfortunately, someone will then complain that it will inconvenience them if there are internet/download issues. You can't please everyone all the time...

Tinkering with deadlines and grace days might reduce the number of No Reply retries that turn out to not be needed because the No Reply machine does return a [valid] result 24 hours or so later, but I doubt that would significantly reduce the number of excess tasks out in the field :-(

Good luck to the tech team in their quest -- sadly, this over-provisioning issue is not unique to WCG and I'm not convinced most projects have an answer that doesn't irritate folks with very fast-turnaround systems (or others who run large buffers with less possible justification..)

Cheers - Al.
[Feb 9, 2023 12:05:26 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7664
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

One of my machines is dual cpu machine with 32 threads. I have this set to queue 64 MCM units. At the present time the turnaround for a unit is under 1.5 days. If the level of the longer running MCM units increases, this will degrade a bit, but will still be somewhere around 2 days. I know there are a few users with 128 or 256 thread machines who can safely run queues safely and easily at probably twice their thread count. These users should not be penalized by some hard limit.
I agree there should be some limit for those who over-queue the capacity of their machine, but don't presently know how this might be implemented. Somebody smarter than me may be able to devise some feedback system where machines with chronic no reply, late reply etc. results would cause the servers to involuntarily limit the number of work units issued to such machines regardless of the individual machine setting. An over ride so to speak. Some kind of enhancement or tweak to the reliable/not reliable parameter for the machine.
Just speculation and musings for the moment.

Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Feb 9, 2023 3:40:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7664
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Too many Pending Validation

Does not appear fixed yet. 670 in PV this morning.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Feb 9, 2023 1:33:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 144   Pages: 15   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread