World Community Grid - View Thread - 2022-08-26 Update (increased WU output & WCG backend changes)

World Community Grid Forums

Category: Official Messages

Forum: News

Thread: 2022-08-26 Update (increased WU output & WCG backend changes)

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 98

[ ]

Author

This topic has been viewed 930986 times and has 97 replies

Cyclops
Senior Cruncher
Joined: Jun 13, 2022
Post Count: 295
Status: Offline


2022-08-26 Update (increased WU output & WCG backend changes)

Dear volunteers,

We have taken additional measures to increase the quantity of WUs we can send out, and we have been able to increase the quantity of WUs in flight at any given time. Volunteers should see this reflected on their devices now, and perhaps even over this past week.

We are also relieved to share that the hosting data centre has assigned additional personnel on site to resolve our networking issues, meaning a fix is imminent. We will share with you any further updates we receive from the data centre. The network fix will allow us to bring our remaining servers online, stabilizing and further increasing the WU supply.

Thus, until we are able to deploy all dedicated servers, we must continuously adjust and monitor tasks scheduled in Aurora/Mesos to keep the tasks balanced and the workunits flowing, and so far this process is unduly intensive and sporadic. For example, a recurring job may saturate the scheduler by creating a large number of downstream jobs. This flood of new jobs might then throttle the processing rate of other waiting jobs and thereby interrupt the supply of work. To fix the problem, we would need to temporarily deschedule the parent job, decrease its frequency, or decrease the priority of its children in such a way that does not starve other stages of the pipeline.

Last week, we mentioned that we have begun to investigate concerns over statistics, credit, streaks, and database dumps raised by volunteers. We will have an update on some of these issues next week. We also plan to release a more structured breakdown from the tech team similar to a CHANGELOG starting next week or the week after so that we can increase the frequency and clarity of updates.

Future Plans for Aurora/Mesos Replacement by SLURM at the WCG
With the above in mind, although we should be able to immediately deploy additional server resources for Aurora/Mesos job scheduling once networking issues are resolved, our team has greater familiarity and experience with the SLURM scheduler, an alternative to Aurora/Mesos. SLURM is a mature technology currently in use at many of the world’s foremost supercomputing centres, and we intend a full transition to SLURM soon after WCG full restart.

Pending some investigation, we may also look to expand our message-passing layer and implement a publisher/subscriber model and some notion of back-pressure to dictate the chain of downloading data from researchers and creating workunits with which to stock the feeder. From what we have observed, we can expect the move to SLURM will distribute our internal server resources more efficiently than Aurora/Mesos currently does, while losing no functionality. This should be relatively straightforward to port since it overlaps with the existing skill-set of the team.

However, this work is not a higher priority than addressing long-standing concerns of volunteers, which we are finally carving out the bandwidth to address.

Thanks for your patience and have a great weekend!
-WCG Tech Team

[Aug 27, 2022 2:45:58 AM]

dough boy
Cruncher
Joined: May 22, 2012
Post Count: 8
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for The Clean Energy Project - Phase 2

90 day badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

90 day badge for Uncovering Genome Mysteries

1 year badge for Outsmart Ebola Together

1 year badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

2 year badge for OpenPandemics - COVID-19


Re: 2022-08-26 Update (increased WU output & WCG backend changes)

I have been able to download several days worth of data for the last almost week.

[Aug 27, 2022 2:54:59 AM]

bfmorse
Senior Cruncher
US
Joined: Jul 26, 2009
Post Count: 296
Status: Offline
Project Badges:

14 day badge for Human Proteome Folding - Phase 2

14 day badge for Help Fight Childhood Cancer

14 day badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Computing for Clean Water

200 year badge for Mapping Cancer Markers

180 day badge for FightAIDS@Home - Phase 2

10 year badge for Smash Childhood Cancer

180 day badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: 2022-08-26 Update (increased WU output & WCG backend changes)

Thank you for the detailed status report.

Many volunteers, myself included, will find the additional details of what has been done, what will be done and the MORE DETAILED timeline that is provided by this update refreshing and welcomed. As well as the CHANGELOG info you will be implementing.

Thanks to the WCG Tech Team in contributing to this update! I have commented elsewhere on the WUs provided and the ongoing but lessening http errors.

Thanks again,
Bruce

[Aug 27, 2022 4:17:56 AM]

danwat1234
Cruncher
Joined: Apr 18, 2020
Post Count: 39
Status: Offline
Project Badges:

50 year badge for Smash Childhood Cancer

50 year badge for OpenPandemics - COVID-19


Re: 2022-08-26 Update (increased WU output & WCG backend changes)

Thank you for the update! All of my machines seem to have been getting regular wcg work this past week or so. I'm surprised how smooth it was once the day or two server hiccup was through. Keep it up!

----------------------------------------
[Edit 1 times, last edit by danwat1234 at Aug 27, 2022 4:23:57 AM]

[Aug 27, 2022 4:23:20 AM]

Foxus
Cruncher
Joined: Oct 22, 2008
Post Count: 2
Status: Offline
Project Badges:

90 day badge for Human Proteome Folding - Phase 2

45 day badge for Help Fight Childhood Cancer

45 day badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for The Clean Energy Project - Phase 2

45 day badge for Computing for Clean Water

45 day badge for Drug Search for Leishmaniasis

14 day badge for GO Fight Against Malaria

14 day badge for Outsmart Ebola Together

90 day badge for FightAIDS@Home - Phase 2

90 day badge for Microbiome Immunity Project

90 day badge for Africa Rainfall Project

180 day badge for OpenPandemics - COVID-19


Re: 2022-08-26 Update (increased WU output & WCG backend changes)

Thanks for the Update and confirmed I get some WU - nice that the new environment gets to life.

Great Job of your Team - we cross our thumbs that the remaining problems will soon be vanished and you get some sleep after all these impressions last weeks.

Good luck and may a light shine on all your ways.

[Aug 27, 2022 7:02:15 AM]

phillipspencer
Advanced Cruncher
France
Joined: Apr 9, 2015
Post Count: 71
Status: Offline
Project Badges:

10 year badge for Mapping Cancer Markers

14 day badge for Uncovering Genome Mysteries

45 day badge for Outsmart Ebola Together

180 day badge for Smash Childhood Cancer

5 year badge for Microbiome Immunity Project

14 day badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: 2022-08-26 Update (increased WU output & WCG backend changes)

Appreciate the detailed update and the indication of future priorities. Good to understand your allocation of resources too.

[Aug 27, 2022 8:39:47 AM]

mdparkhill
Advanced Cruncher
Joined: May 2, 2007
Post Count: 60
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

14 day badge for Help Cure Muscular Dystrophy

90 day badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

45 day badge for Discovering Dengue Drugs - Together - Phase 2

1 year badge for The Clean Energy Project - Phase 2

1 year badge for Computing for Clean Water

90 day badge for Drug Search for Leishmaniasis

45 day badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

45 day badge for Uncovering Genome Mysteries

180 day badge for Outsmart Ebola Together

1 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: 2022-08-26 Update (increased WU output & WCG backend changes)

Good news to hear that someone is finally taking the networking issue seriously. It was nice to learn more about the backend for the scheduling. I had not even considered that as issue. I do mainframes and some times out schedulers go ape and it's all my fault even when it's user error. Thanks again for the updates and i just got 60-60 tasks, down loads still a little slow but it appears to working better(I hope crossed fingers).

----------------------------------------

[Aug 27, 2022 11:00:53 AM]

nivrip
Senior Cruncher
North Yorkshire
Joined: Sep 13, 2007
Post Count: 264
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding - Phase 2

90 day badge for Nutritious Rice for the World

90 day badge for Help Fight Childhood Cancer

1 year badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for Uncovering Genome Mysteries

10 year badge for Outsmart Ebola Together

20 year badge for Smash Childhood Cancer


Re: 2022-08-26 Update (increased WU output & WCG backend changes)

Thanks for the info. Getting plenty of WUs now but still occasional hiccups with some of them stuck in Transfers. Using the Retry button always does the trick over a minute or two.

----------------------------------------

ЮРКШИР КРУНЧЕР

[Aug 27, 2022 11:49:16 AM]

spRocket
Senior Cruncher
Joined: Mar 25, 2020
Post Count: 274
Status: Offline
Project Badges:

50 year badge for Mapping Cancer Markers


Re: 2022-08-26 Update (increased WU output & WCG backend changes)

Still having to babysit my zoo, but the work is coming in steadily. Thanks for the update!

[Aug 27, 2022 2:17:26 PM]

Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 949
Status: Offline
Project Badges:

45 day badge for Microbiome Immunity Project

1 year badge for Africa Rainfall Project

1 year badge for OpenPandemics - COVID-19


Re: 2022-08-26 Update (increased WU output & WCG backend changes)

love the detailed update. I'm getting WUs with some minor hiccups, but it is great to know my machine is useful again.

[Aug 27, 2022 2:22:22 PM]

[ ]