World Community Grid - View Thread - Project Status Information (first post updated)

World Community Grid Forums

Category: Community

Forum: Chat Room

Thread: Project Status Information (first post updated)

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 155

[ ]

Author

This topic has been viewed 755902 times and has 154 replies

Aperture_Science_Innovators
Advanced Cruncher
United States
Joined: Jul 6, 2009
Post Count: 139
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

2 year badge for Nutritious Rice for the World

2 year badge for Help Fight Childhood Cancer

2 year badge for Help Cure Muscular Dystrophy - Phase 2

10 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

1 year badge for Computing for Sustainable Water

200 year badge for Mapping Cancer Markers

50 year badge for Uncovering Genome Mysteries

200 year badge for Outsmart Ebola Together

50 year badge for FightAIDS@Home - Phase 2

100 year badge for Smash Childhood Cancer

100 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

100 year badge for OpenPandemics - COVID-19


Re: Daily WorkUnit Flow Information

Issue appears to have spread to SCC too. Only getting resends since about 07:00 UTC.

----------------------------------------

[Aug 12, 2023 1:35:40 PM]

Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7664
Status: Offline
Project Badges:

14 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

14 day badge for The Clean Energy Project

10 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

45 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Computing for Sustainable Water

5 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project


Re: Daily WorkUnit Flow Information

Issue appears to have spread to SCC too. Only getting resends since about 07:00 UTC.

Same here. One machine dry, others to follow soon.
Cheers

----------------------------------------

Sgt. Joe
*Minnesota Crunchers*

[Aug 12, 2023 1:54:18 PM]

Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 955
Status: Offline
Project Badges:

180 day badge for Smash Childhood Cancer

45 day badge for Microbiome Immunity Project

1 year badge for Africa Rainfall Project

1 year badge for OpenPandemics - COVID-19


Re: Daily WorkUnit Flow Information

Welcome to the weekend. Looks like there is only a smattering of resends for MCM and SCC available.

[Aug 12, 2023 2:10:46 PM]

Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 955
Status: Offline
Project Badges:


Re: Daily WorkUnit Flow Information

bumping this back up in hopes we have less No WUs posts.

[Aug 14, 2023 1:50:39 PM]

TigerLily
Senior Cruncher
Joined: May 26, 2023
Post Count: 280
Status: Offline
Project Badges:


Re: Daily WorkUnit Flow Information

Hi everyone,

The issue of no work unit availability on weekends was brought to the team last week. They are currently working on investigating and fixing this issue.

[Aug 14, 2023 2:41:20 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12369
Status: Recently Active
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

45 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

90 day badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

5 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Daily WorkUnit Flow Information

TigerLily

Thanks, but we have been reporting this for months.

Mike

[Aug 14, 2023 5:09:08 PM]

Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 955
Status: Offline
Project Badges:


Re: Daily WorkUnit Flow Information

Enjoying the workunits we are getting now. Thank you TigerLily for letting the team know about the weekend issues.

I'm hoping for a good weekend without issues soon.

[Aug 16, 2023 10:32:46 PM]

adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2157
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

90 day badge for Nutritious Rice for the World

180 day badge for The Clean Energy Project - Phase 2

1 year badge for Computing for Clean Water

1 year badge for GO Fight Against Malaria

100 year badge for Mapping Cancer Markers

1 year badge for Uncovering Genome Mysteries

2 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

5 year badge for Microbiome Immunity Project

50 year badge for OpenPandemics - COVID-19


Re: Daily WorkUnit Flow Information

Regarding the issue of no workunit availability on weekends.

Each time lately, before the weekend, I'm seeing a buildup of tasks that are "Waiting to be sent".
Currently I am seeing a large number of "Waiting to be sent" that don't get sent again.

 <10> * MCM1_0202379_7015_0  Fedora Linux  Pending Validation    2023-08-10T17:06:19  2023-08-12T06:42:29
 <10>   MCM1_0202379_7015_1  Linux Pop     No Reply              2023-08-10T17:05:49  2023-08-16T17:05:49
 <10>   MCM1_0202379_7015_2                Waiting to be sent

 <44>   SCC1_0004188_MyoD1-C_47804_0  Darwin        Error                 2023-08-08T16:29:33  2023-08-14T16:30:11
 <44> * SCC1_0004188_MyoD1-C_47804_1  Fedora Linux  Pending Verification  2023-08-14T16:38:30  2023-08-14T17:42:01
 <44>   SCC1_0004188_MyoD1-C_47804_2                Waiting to be sent

 <49> * SCC1_0004201_MyoD1-C_94778_0  Fedora Linux  Pending Verification  2023-08-12T03:31:40  2023-08-16T11:37:22
 <49>   SCC1_0004201_MyoD1-C_94778_1                Waiting to be sent

 <50> * SCC1_0004198_MyoD1-C_27249_0  Fedora Linux  Pending Verification  2023-08-11T11:35:51  2023-08-15T21:29:12
 <50>   SCC1_0004198_MyoD1-C_27249_1                Waiting to be sent

 <51> * SCC1_0004167_MyoD1-C_65587_0  Fedora Linux  Pending Verification  2023-08-11T01:33:55  2023-08-15T13:24:31
 <51>   SCC1_0004167_MyoD1-C_65587_1                Waiting to be sent

 <52> * SCC1_0004202_MyoD1-C_42452_0  Fedora Linux  Pending Verification  2023-08-10T22:24:49  2023-08-15T11:00:51
 <52>   SCC1_0004202_MyoD1-C_42452_1                Waiting to be sent

 <53> * SCC1_0004204_MyoD1-C_29380_0  Fedora Linux  Pending Verification  2023-08-10T21:45:34  2023-08-15T09:43:34
 <53>   SCC1_0004204_MyoD1-C_29380_1                Waiting to be sent

 <54> * SCC1_0004202_MyoD1-C_38304_0  Fedora Linux  Pending Verification  2023-08-10T21:06:35  2023-08-15T08:32:31
 <54>   SCC1_0004202_MyoD1-C_38304_1                Waiting to be sent

 <55> * SCC1_0004204_MyoD1-C_24999_0  Fedora Linux  Pending Verification  2023-08-10T20:14:25  2023-08-15T07:45:22
 <55>   SCC1_0004204_MyoD1-C_24999_1                Waiting to be sent

 <56> * SCC1_0004190_MyoD1-C_77107_0  Fedora Linux  Pending Verification  2023-08-10T18:45:14  2023-08-15T03:49:53
 <56>   SCC1_0004190_MyoD1-C_77107_1                Waiting to be sent

 <57> * SCC1_0004171_MyoD1-C_95601_0  Fedora Linux  Pending Verification  2023-08-10T15:20:47  2023-08-14T23:41:34
 <57>   SCC1_0004171_MyoD1-C_95601_1                Waiting to be sent

 <58> * SCC1_0004178_MyoD1-C_63487_0  Fedora Linux  Pending Validation    2023-08-09T14:07:44  2023-08-11T19:02:22
 <58>   SCC1_0004178_MyoD1-C_63487_1  Fedora Linux  Error                 2023-08-09T14:08:31  2023-08-15T17:14:47
 <58>   SCC1_0004178_MyoD1-C_63487_2                Waiting to be sent

 <59> * SCC1_0004184_MyoD1-C_94193_0  Fedora Linux  Pending Validation    2023-08-09T14:07:44  2023-08-11T20:43:11
 <59>   SCC1_0004184_MyoD1-C_94193_1  Linux Ubuntu  Error                 2023-08-09T14:08:28  2023-08-15T14:09:36
 <59>   SCC1_0004184_MyoD1-C_94193_2                Waiting to be sent

 <60> * SCC1_0004182_MyoD1-C_65839_0  Fedora Linux  Pending Validation    2023-08-09T14:05:38  2023-08-11T13:48:09
 <60>   SCC1_0004182_MyoD1-C_65839_1  Linux Ubuntu  Error                 2023-08-09T14:06:19  2023-08-15T14:07:27
 <60>   SCC1_0004182_MyoD1-C_65839_2                Waiting to be sent

(<60> … <73> looking all alike)

 <73> * SCC1_0004177_MyoD1-C_76764_0  Fedora Linux  Pending Validation    2023-08-09T14:05:35  2023-08-11T02:32:52
 <73>   SCC1_0004177_MyoD1-C_76764_1  Linux Ubuntu  Error                 2023-08-09T14:06:19  2023-08-15T14:07:27
 <73>   SCC1_0004177_MyoD1-C_76764_2                Waiting to be sent

 <74> * SCC1_0004190_MyoD1-C_56791_0  Fedora Linux  Pending Validation    2023-08-09T14:01:23  2023-08-10T12:05:47
 <74>   SCC1_0004190_MyoD1-C_56791_1  Linux Ubuntu  No Reply              2023-08-09T14:03:36  2023-08-15T14:03:36
 <74>   SCC1_0004190_MyoD1-C_56791_2                Waiting to be sent

(<74> … <88> looking all alike)

 <88> * SCC1_0004180_MyoD1-C_90992_0  Fedora Linux  Pending Validation    2023-08-09T13:59:16  2023-08-10T06:29:30
 <88>   SCC1_0004180_MyoD1-C_90992_1  Linux Pop     Error                 2023-08-09T14:01:28  2023-08-15T14:03:34
 <88>   SCC1_0004180_MyoD1-C_90992_2                Waiting to be sent

 <89> * SCC1_0004167_MyoD1-C_43291_0  Fedora Linux  Pending Validation    2023-08-09T13:59:16  2023-08-10T05:22:06
 <89>   SCC1_0004167_MyoD1-C_43291_1  Linux Pop     No Reply              2023-08-09T14:01:28  2023-08-15T14:01:30
 <89>   SCC1_0004167_MyoD1-C_43291_2                Waiting to be sent

 <90> * SCC1_0004179_MyoD1-C_87530_0  Fedora Linux  Pending Validation    2023-08-09T13:59:15  2023-08-10T00:28:33
 <90>   SCC1_0004179_MyoD1-C_87530_1  Linux Pop     Error                 2023-08-09T14:01:28  2023-08-15T14:03:34
 <90>   SCC1_0004179_MyoD1-C_87530_2                Waiting to be sent

 <91> * SCC1_0004170_MyoD1-C_55865_0  Fedora Linux  Pending Validation    2023-08-09T13:59:14  2023-08-09T19:54:22
 <91>   SCC1_0004170_MyoD1-C_55865_1  Linux Pop     Error                 2023-08-09T14:01:28  2023-08-15T14:03:34
 <91>   SCC1_0004170_MyoD1-C_55865_2                Waiting to be sent

 <92> * SCC1_0004177_MyoD1-C_76359_0  Fedora Linux  Pending Validation    2023-08-09T13:59:14  2023-08-09T20:11:27
 <92>   SCC1_0004177_MyoD1-C_76359_1  Linux Pop     Error                 2023-08-09T14:01:28  2023-08-15T14:03:34
 <92>   SCC1_0004177_MyoD1-C_76359_2                Waiting to be sent

 <93> * SCC1_0004185_MyoD1-C_77338_0  Fedora Linux  Pending Validation    2023-08-09T13:59:14  2023-08-09T23:25:32
 <93>   SCC1_0004185_MyoD1-C_77338_1  Linux Pop     Error                 2023-08-09T14:01:28  2023-08-15T14:03:34
 <93>   SCC1_0004185_MyoD1-C_77338_2                Waiting to be sent

(Generated by wcgstats -wsPQ -a0 -m0 -SS -P100, then redacted on nrs. 61-72 and 75-87.)

That's it, almost 50 tasks that look like they refuse to get sent (and within a reasonable amount of time). Anyone else with such large numbers? (You have to dig deeper than just looking at your own tasks only, so a tool like wcgstats would be nice to have/use.)

Adri

[Aug 17, 2023 9:02:24 AM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 953
Status: Offline
Project Badges:

14 day badge for Discovering Dengue Drugs - Together

1 year badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

14 day badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries


Re: Daily WorkUnit Flow Information

Adri,

I'm operating on a smaller sample than you are, but I can confirm the observation.

I suspect it's a characteristic of the current "feast or famine" regarding supply of new work units. I posted a quite long comment about this in the News thread "2023-07-31 Update (MCM1 issue resolved)" in which I commented on the cyclic nature of this behaviour. You might find it interesting...

Because Ralf's posts still seem to be moderated, I missed his response at the time[1]; it made a valid point about this being recent behaviour, which I might have commented on there if I'd seen it earlier!

This cyclic behaviour seems to have kicked in for MCM1 after the late July outage, so users with larger buffers may have suddenly acquired far more units all due at about the same time once supplies were restored. If the user system buffers were being replenished at a more steady rate (as was probable before the outage) the time distribution of retry requests would be far more even (and unlikely to be as problematic...)

I am unsure whether the tools they use to control issue of new work are flexible enough to deal with this -- only an insider would know. And I fear that the only ways to stop the cyclic work pattern once it has started would be to either put a [temporary] cap on the number of tasks any user could have (for MCM1) or finding a way to get the (MCM1) feeder mechanism to give [slightly?] less priority to retries...

I'll also note here that it's no surprise that a project (such as SCC1) that uses adaptive replication is more likely to simply run out of work than to go through a huge backlog of retries! MCM1, however, could easily end up with many, many thousands]2] of retries in a worst-case scenario...

Cheers - Al.

[1] If I'd seen it then I might have asked him whether he had actually read the bit that offered a possible explanation for the cyclic pattern, or whether he just thought it was irrelevant :-)

[2] With the large number of MCM1 work units processed each week, even a 1% retry rate would be a lot of tasks if most of them were asked for at about the same time. And judging by my recent wingmen, I suspect that the present retry rate is quite a bit higher :-(

[Aug 17, 2023 1:37:26 PM]

Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 955
Status: Offline
Project Badges:


Re: Daily WorkUnit Flow Information

Thank you Adi and Alan for taking the time to dig into the info you can gather and posting it. I hope Tigerlily will pass some of this additional info on to the team so they can solve the problem quicker.

There is definitely an issue that a repeated outage from the server will cause a massive refill the cache event with all the same return date, as well as cause users to expand their cache sizes to avoid the next time the event happens. It is a valid reaction, but it makes the initial problem worse and/or harder to diagnose.

I know on the SCC side the MacOS mismatch valid/invalid issue is causing more resends as I personally flip between trusted machine to earning trust again.

[Aug 17, 2023 2:09:34 PM]

[ ]