Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 155
Posts: 155   Pages: 16   [ Previous Page | 6 7 8 9 10 11 12 13 14 15 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 755902 times and has 154 replies Next Thread
Aperture_Science_Innovators
Advanced Cruncher
United States
Joined: Jul 6, 2009
Post Count: 139
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Daily WorkUnit Flow Information

Issue appears to have spread to SCC too. Only getting resends since about 07:00 UTC.
----------------------------------------

[Aug 12, 2023 1:35:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7664
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Daily WorkUnit Flow Information

Issue appears to have spread to SCC too. Only getting resends since about 07:00 UTC.

Same here. One machine dry, others to follow soon.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Aug 12, 2023 1:54:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 955
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Daily WorkUnit Flow Information

Welcome to the weekend. Looks like there is only a smattering of resends for MCM and SCC available.
[Aug 12, 2023 2:10:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 955
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Daily WorkUnit Flow Information

bumping this back up in hopes we have less No WUs posts.
[Aug 14, 2023 1:50:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TigerLily
Senior Cruncher
Joined: May 26, 2023
Post Count: 280
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Daily WorkUnit Flow Information

Hi everyone,

The issue of no work unit availability on weekends was brought to the team last week. They are currently working on investigating and fixing this issue.
[Aug 14, 2023 2:41:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12369
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Daily WorkUnit Flow Information

TigerLily

Thanks, but we have been reporting this for months.

Mike
[Aug 14, 2023 5:09:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 955
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Daily WorkUnit Flow Information

Enjoying the workunits we are getting now. Thank you TigerLily for letting the team know about the weekend issues.

I'm hoping for a good weekend without issues soon.
[Aug 16, 2023 10:32:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2157
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Daily WorkUnit Flow Information

Regarding the issue of no workunit availability on weekends.

Each time lately, before the weekend, I'm seeing a buildup of tasks that are "Waiting to be sent".
Currently I am seeing a large number of "Waiting to be sent" that don't get sent again.

 <10> * MCM1_0202379_7015_0  Fedora Linux  Pending Validation    2023-08-10T17:06:19  2023-08-12T06:42:29
<10> MCM1_0202379_7015_1 Linux Pop No Reply 2023-08-10T17:05:49 2023-08-16T17:05:49
<10> MCM1_0202379_7015_2 Waiting to be sent

<44> SCC1_0004188_MyoD1-C_47804_0 Darwin Error 2023-08-08T16:29:33 2023-08-14T16:30:11
<44> * SCC1_0004188_MyoD1-C_47804_1 Fedora Linux Pending Verification 2023-08-14T16:38:30 2023-08-14T17:42:01
<44> SCC1_0004188_MyoD1-C_47804_2 Waiting to be sent

<49> * SCC1_0004201_MyoD1-C_94778_0 Fedora Linux Pending Verification 2023-08-12T03:31:40 2023-08-16T11:37:22
<49> SCC1_0004201_MyoD1-C_94778_1 Waiting to be sent

<50> * SCC1_0004198_MyoD1-C_27249_0 Fedora Linux Pending Verification 2023-08-11T11:35:51 2023-08-15T21:29:12
<50> SCC1_0004198_MyoD1-C_27249_1 Waiting to be sent

<51> * SCC1_0004167_MyoD1-C_65587_0 Fedora Linux Pending Verification 2023-08-11T01:33:55 2023-08-15T13:24:31
<51> SCC1_0004167_MyoD1-C_65587_1 Waiting to be sent

<52> * SCC1_0004202_MyoD1-C_42452_0 Fedora Linux Pending Verification 2023-08-10T22:24:49 2023-08-15T11:00:51
<52> SCC1_0004202_MyoD1-C_42452_1 Waiting to be sent

<53> * SCC1_0004204_MyoD1-C_29380_0 Fedora Linux Pending Verification 2023-08-10T21:45:34 2023-08-15T09:43:34
<53> SCC1_0004204_MyoD1-C_29380_1 Waiting to be sent

<54> * SCC1_0004202_MyoD1-C_38304_0 Fedora Linux Pending Verification 2023-08-10T21:06:35 2023-08-15T08:32:31
<54> SCC1_0004202_MyoD1-C_38304_1 Waiting to be sent

<55> * SCC1_0004204_MyoD1-C_24999_0 Fedora Linux Pending Verification 2023-08-10T20:14:25 2023-08-15T07:45:22
<55> SCC1_0004204_MyoD1-C_24999_1 Waiting to be sent

<56> * SCC1_0004190_MyoD1-C_77107_0 Fedora Linux Pending Verification 2023-08-10T18:45:14 2023-08-15T03:49:53
<56> SCC1_0004190_MyoD1-C_77107_1 Waiting to be sent

<57> * SCC1_0004171_MyoD1-C_95601_0 Fedora Linux Pending Verification 2023-08-10T15:20:47 2023-08-14T23:41:34
<57> SCC1_0004171_MyoD1-C_95601_1 Waiting to be sent

<58> * SCC1_0004178_MyoD1-C_63487_0 Fedora Linux Pending Validation 2023-08-09T14:07:44 2023-08-11T19:02:22
<58> SCC1_0004178_MyoD1-C_63487_1 Fedora Linux Error 2023-08-09T14:08:31 2023-08-15T17:14:47
<58> SCC1_0004178_MyoD1-C_63487_2 Waiting to be sent

<59> * SCC1_0004184_MyoD1-C_94193_0 Fedora Linux Pending Validation 2023-08-09T14:07:44 2023-08-11T20:43:11
<59> SCC1_0004184_MyoD1-C_94193_1 Linux Ubuntu Error 2023-08-09T14:08:28 2023-08-15T14:09:36
<59> SCC1_0004184_MyoD1-C_94193_2 Waiting to be sent

<60> * SCC1_0004182_MyoD1-C_65839_0 Fedora Linux Pending Validation 2023-08-09T14:05:38 2023-08-11T13:48:09
<60> SCC1_0004182_MyoD1-C_65839_1 Linux Ubuntu Error 2023-08-09T14:06:19 2023-08-15T14:07:27
<60> SCC1_0004182_MyoD1-C_65839_2 Waiting to be sent

(<60> … <73> looking all alike)

<73> * SCC1_0004177_MyoD1-C_76764_0 Fedora Linux Pending Validation 2023-08-09T14:05:35 2023-08-11T02:32:52
<73> SCC1_0004177_MyoD1-C_76764_1 Linux Ubuntu Error 2023-08-09T14:06:19 2023-08-15T14:07:27
<73> SCC1_0004177_MyoD1-C_76764_2 Waiting to be sent

<74> * SCC1_0004190_MyoD1-C_56791_0 Fedora Linux Pending Validation 2023-08-09T14:01:23 2023-08-10T12:05:47
<74> SCC1_0004190_MyoD1-C_56791_1 Linux Ubuntu No Reply 2023-08-09T14:03:36 2023-08-15T14:03:36
<74> SCC1_0004190_MyoD1-C_56791_2 Waiting to be sent

(<74> … <88> looking all alike)

<88> * SCC1_0004180_MyoD1-C_90992_0 Fedora Linux Pending Validation 2023-08-09T13:59:16 2023-08-10T06:29:30
<88> SCC1_0004180_MyoD1-C_90992_1 Linux Pop Error 2023-08-09T14:01:28 2023-08-15T14:03:34
<88> SCC1_0004180_MyoD1-C_90992_2 Waiting to be sent

<89> * SCC1_0004167_MyoD1-C_43291_0 Fedora Linux Pending Validation 2023-08-09T13:59:16 2023-08-10T05:22:06
<89> SCC1_0004167_MyoD1-C_43291_1 Linux Pop No Reply 2023-08-09T14:01:28 2023-08-15T14:01:30
<89> SCC1_0004167_MyoD1-C_43291_2 Waiting to be sent

<90> * SCC1_0004179_MyoD1-C_87530_0 Fedora Linux Pending Validation 2023-08-09T13:59:15 2023-08-10T00:28:33
<90> SCC1_0004179_MyoD1-C_87530_1 Linux Pop Error 2023-08-09T14:01:28 2023-08-15T14:03:34
<90> SCC1_0004179_MyoD1-C_87530_2 Waiting to be sent

<91> * SCC1_0004170_MyoD1-C_55865_0 Fedora Linux Pending Validation 2023-08-09T13:59:14 2023-08-09T19:54:22
<91> SCC1_0004170_MyoD1-C_55865_1 Linux Pop Error 2023-08-09T14:01:28 2023-08-15T14:03:34
<91> SCC1_0004170_MyoD1-C_55865_2 Waiting to be sent

<92> * SCC1_0004177_MyoD1-C_76359_0 Fedora Linux Pending Validation 2023-08-09T13:59:14 2023-08-09T20:11:27
<92> SCC1_0004177_MyoD1-C_76359_1 Linux Pop Error 2023-08-09T14:01:28 2023-08-15T14:03:34
<92> SCC1_0004177_MyoD1-C_76359_2 Waiting to be sent

<93> * SCC1_0004185_MyoD1-C_77338_0 Fedora Linux Pending Validation 2023-08-09T13:59:14 2023-08-09T23:25:32
<93> SCC1_0004185_MyoD1-C_77338_1 Linux Pop Error 2023-08-09T14:01:28 2023-08-15T14:03:34
<93> SCC1_0004185_MyoD1-C_77338_2 Waiting to be sent

(Generated by wcgstats -wsPQ -a0 -m0 -SS -P100, then redacted on nrs. 61-72 and 75-87.)

That's it, almost 50 tasks that look like they refuse to get sent (and within a reasonable amount of time). Anyone else with such large numbers? (You have to dig deeper than just looking at your own tasks only, so a tool like wcgstats would be nice to have/use.)

Adri
[Aug 17, 2023 9:02:24 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 953
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Daily WorkUnit Flow Information

Adri,

I'm operating on a smaller sample than you are, but I can confirm the observation.

I suspect it's a characteristic of the current "feast or famine" regarding supply of new work units. I posted a quite long comment about this in the News thread "2023-07-31 Update (MCM1 issue resolved)" in which I commented on the cyclic nature of this behaviour. You might find it interesting...

Because Ralf's posts still seem to be moderated, I missed his response at the time[1]; it made a valid point about this being recent behaviour, which I might have commented on there if I'd seen it earlier!

This cyclic behaviour seems to have kicked in for MCM1 after the late July outage, so users with larger buffers may have suddenly acquired far more units all due at about the same time once supplies were restored. If the user system buffers were being replenished at a more steady rate (as was probable before the outage) the time distribution of retry requests would be far more even (and unlikely to be as problematic...)

I am unsure whether the tools they use to control issue of new work are flexible enough to deal with this -- only an insider would know. And I fear that the only ways to stop the cyclic work pattern once it has started would be to either put a [temporary] cap on the number of tasks any user could have (for MCM1) or finding a way to get the (MCM1) feeder mechanism to give [slightly?] less priority to retries...

I'll also note here that it's no surprise that a project (such as SCC1) that uses adaptive replication is more likely to simply run out of work than to go through a huge backlog of retries! MCM1, however, could easily end up with many, many thousands]2] of retries in a worst-case scenario...

Cheers - Al.

[1] If I'd seen it then I might have asked him whether he had actually read the bit that offered a possible explanation for the cyclic pattern, or whether he just thought it was irrelevant :-)

[2] With the large number of MCM1 work units processed each week, even a 1% retry rate would be a lot of tasks if most of them were asked for at about the same time. And judging by my recent wingmen, I suspect that the present retry rate is quite a bit higher :-(
[Aug 17, 2023 1:37:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 955
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Daily WorkUnit Flow Information

Thank you Adi and Alan for taking the time to dig into the info you can gather and posting it. I hope Tigerlily will pass some of this additional info on to the team so they can solve the problem quicker.

There is definitely an issue that a repeated outage from the server will cause a massive refill the cache event with all the same return date, as well as cause users to expand their cache sizes to avoid the next time the event happens. It is a valid reaction, but it makes the initial problem worse and/or harder to diagnose.

I know on the SCC side the MacOS mismatch valid/invalid issue is causing more resends as I personally flip between trusted machine to earning trust again.
[Aug 17, 2023 2:09:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 155   Pages: 16   [ Previous Page | 6 7 8 9 10 11 12 13 14 15 | Next Page ]
[ Jump to Last Post ]
Post new Thread