Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 207
Posts: 207   Pages: 21   [ Previous Page | 5 6 7 8 9 10 11 12 13 14 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 14155 times and has 206 replies Next Thread
uglyphilbert
Cruncher
Joined: Mar 11, 2017
Post Count: 17
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

For me, any work units in the afternoon of 14/1/25 did not get added to "History" or the total in "Overview" even though they were added to the results of the individual "Devices". They are devices I have been using for a long time. Work units for 15/1/25 have been added, so it seems to have failed for that 12 hour period. Anyone else have this problem?
[Jan 15, 2025 10:01:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2157
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

What is 'the afternoon'? We are living across the globe, it would help if you specify by UTC, timewise.

Adri
----------------------------------------
[Edit 1 times, last edit by adriverhoef at Jan 16, 2025 12:05:52 AM]
[Jan 16, 2025 12:03:38 AM]   Link   Report threatening or abusive post: please login first  Go to top 
uglyphilbert
Cruncher
Joined: Mar 11, 2017
Post Count: 17
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Yeah sorry I meant UTC, the 2nd update of the day from midday to midnight
[Jan 16, 2025 2:43:43 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Hans Sveen
Veteran Cruncher
Norge
Joined: Feb 18, 2008
Post Count: 820
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

A newsletter just published this evening:

https://www.worldcommunitygrid.org/newsletters
[Jan 16, 2025 7:25:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 955
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Thank you Hans ! you beat me to the post. I linked the pdf in the first post. I'll try and keep it updated. I hope these out regularly.
[Jan 16, 2025 8:20:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2167
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Lots of "Waiting to be sent" tasks now.

One example: https://www.worldcommunitygrid.org/contribution/workunit/641039380
----------------------------------------
[Edit 1 times, last edit by Grumpy Swede at Jan 17, 2025 7:04:17 AM]
[Jan 17, 2025 7:02:45 AM]   Link   Report threatening or abusive post: please login first  Go to top 
uglyphilbert
Cruncher
Joined: Mar 11, 2017
Post Count: 17
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

All the bugs are creeping back in again and nobody around to address them, or even acknowledge them.
[Jan 17, 2025 1:54:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 953
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

As noted, once again there are lots of "Waiting to be sent" tasks for MCM1; this happens at WCG when a serious backlog of retries for No Reply or Not Started by Deadline tasks gets a chance to build up :-( -- it is more common after outages, but it can happen at other times if there are enough users failing to return work within deadlines.

It seems to be a characteristic of the way work units are being scanned for potential tasks to send out. If precedence is being given to new work (e.g. by scanning work unit numbers in descending order) retries for older work units will be at the wrong end of the queue!

The normal expectation would be that work units are scanned in an order that gives precedence to retries[*1], but I suspect that user complaints about not being able to get work when there was a "retry storm" such as that caused by that rogue cluster (or whatever it was!) a while ago may have influenced the ordering, despite the fact that doing so runs the risk of creating a storm when there doesn't need to be one!

New MCM1 work is being poured into the system for issue so quickly that unless they stop putting new work into the queues for a while this problem will arise sometimes. Over the last year or so, we have then had periods where users are complaining about "Tasks are committed to other platforms" because only retries are available whilst such a backlog is being cleared, and there are so many that the feed buffer is quite likely to only get work for one platform on each load-up.

I wonder if they should simply let retries get out first, or whether a more frequent [but brief] suspension of new work supply should be considered after the next unblock effort? I'm not sure how their work generation tools work, but I suspect something of that ilk could be automated, and if there's no new work for an hour or so every so often it shouldn't concern most users too much given the typical run times for everything except OPNG (which was effectively rationed anyway!)

Cheers - Al.

P.S. If some users could be dissuaded from getting more work than they can possibly run within the deadlines, we'd be less likely to see build-ups of retries at times when there hasn't been an outage, but that's a different matter :-)

*1 -- Ascending order is probably a safe technique under normal circumstances as I believe there is a mechanism for giving urgent work priority despite work unit number order.
[Jan 17, 2025 4:14:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 953
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

All the bugs are creeping back in again and nobody around to address them, or even acknowledge them.
I rather suspect that WCG Tech team is well aware of the retries issue but I rather suspect it's a case of "Which of these 10 equally urgent tasks do you want me to do next?" -- personally I'd rather he just did something about it when he could, rather than spending time communicating about it!

By the way, you are aware that the Tech Team is one person and that they no longer have an intern to do "communications" stuff, aren't you? :-)

Cheers - Al.

P.S. Having often been in that same "10 tasks at once" position when I was still working, I have to balance frustration and sympathy in such cases :-)
[Jan 17, 2025 4:23:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7664
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

I am sure there are a number of ways to address the retry swarms, but not knowing how their feeder system is constructed, I would not even hazard any potential solutions. I would presume the person(s) in charge are probably much better versed than I am in queuing theory and the differences in serial vs, parallel flows, pipeline capacity, differing platforms and other items. Tweaks on these systems without adequate preparation and testing can lead to some unintended consequences. I agree with Al that the caching of too many work units and not being able to complete them in a timely fashion can put added stress on the system. Occasionally it results in some work units being crunched 3 times due to timing issues which is an unnecessary redundancy.
For myself, when the system is operating correctly, I rarely have any problems getting an adequate supply for either Windows or Linux. (No MAC systems here). Whether they are normal units or re-sends is irrelevant to me.

Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Jan 17, 2025 5:51:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 207   Pages: 21   [ Previous Page | 5 6 7 8 9 10 11 12 13 14 | Next Page ]
[ Jump to Last Post ]
Post new Thread