World Community Grid - View Thread - 2023-01-25 Update (ARP & OPN1 workunits)

World Community Grid Forums

Category: Official Messages

Forum: News

Thread: 2023-01-25 Update (ARP & OPN1 workunits)

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 35

[ ]

Author

This topic has been viewed 80661 times and has 34 replies

Cyclops
Senior Cruncher
Joined: Jun 13, 2022
Post Count: 295
Status: Offline


2023-01-25 Update (ARP & OPN1 workunits)

ARP & OPN1 workunits

On Monday afternoon, many volunteers reported receiving new ARP1 and OPN1 workunits. These workunits are not from a new batch; these are older WUs that were never sent out due to an overloaded server causing problems in our workunit-distribution process. ARP1 and OPN1/OPNG teams remain on temporary pause, preparing new workunits.

In addition, this infusion of about 2 million WUs helped us to confirm that the networking/download issues we have in the data center persist under a normal load. Improvements made by the SHARCNET team did reduce network congestion. However, based on these results, they are now implementing further modifications to the network, which should resolve these issues for the future. We will keep you updated with further details about the upcoming maintenance, once we receive more information from the SHARCNET team.

Thank you for sending reports of HTTP errors that were experienced by volunteers processing the recent ARP1/OPN1 workunits, which helped us diagnose these errors. The effect is especially strong after an outage, because of the pent-up demand by all the connected BOINC clients. The backlog of workunits released for distribution over the last few days produced the same effect. We continue working together with the SHARCNET team on improving our network. In parallel, we are finalizing the SSD storage upgrade we mentioned in December, and this will also help in improving WCG backend performance.

If you have any comments or questions, please leave them in this thread for us to answer. Thank you for your support, patience and understanding.

WCG team

[Jan 26, 2023 2:15:57 AM]

Hans Sveen
Veteran Cruncher
Norge
Joined: Feb 18, 2008
Post Count: 818
Status: Offline
Project Badges:

14 day badge for Human Proteome Folding - Phase 2

14 day badge for The Clean Energy Project - Phase 2

90 day badge for Uncovering Genome Mysteries

1 year badge for Outsmart Ebola Together

1 year badge for FightAIDS@Home - Phase 2

1 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: 2023-01-25 Update (ARP & OPN1 workunits)

Hello
Thank You again for the information!
Looking forward to further infomation as
the project is getting back to running
at full steam👍🤞🏻😊

With regards,
Hans S.
Oslo

[Jan 26, 2023 10:22:34 AM]

Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7655
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

14 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

2 year badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

10 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

45 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

200 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

100 year badge for Smash Childhood Cancer

10 year badge for Microbiome Immunity Project

100 year badge for OpenPandemics - COVID-19


Re: 2023-01-25 Update (ARP & OPN1 workunits)

Thank you for the update.
Cheers

----------------------------------------

Sgt. Joe
*Minnesota Crunchers*

[Jan 26, 2023 2:50:49 PM]

ADDIE2014
Cruncher
Joined: Apr 13, 2019
Post Count: 31
Status: Offline
Project Badges:

10 year badge for Smash Childhood Cancer

2 year badge for Microbiome Immunity Project

1 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: 2023-01-25 Update (ARP & OPN1 workunits)

Thanks for the update Cyclops

[Jan 26, 2023 3:07:55 PM]

Aperture_Science_Innovators
Advanced Cruncher
United States
Joined: Jul 6, 2009
Post Count: 139
Status: Offline
Project Badges:

2 year badge for Help Fight Childhood Cancer

10 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

1 year badge for Computing for Sustainable Water

50 year badge for Uncovering Genome Mysteries

200 year badge for Outsmart Ebola Together

50 year badge for FightAIDS@Home - Phase 2

100 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project


Re: 2023-01-25 Update (ARP & OPN1 workunits)

Aw, I was enjoying seeing work from several sub-projects again :-)

Ty for the update regardless, and may the teams get their projects ready for more work soon!

----------------------------------------

[Jan 26, 2023 4:22:04 PM]

adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2153
Status: Recently Active
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

90 day badge for Nutritious Rice for the World

14 day badge for Discovering Dengue Drugs - Together - Phase 2

180 day badge for The Clean Energy Project - Phase 2

1 year badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

100 year badge for Mapping Cancer Markers

1 year badge for Uncovering Genome Mysteries

2 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

5 year badge for Microbiome Immunity Project


Re: 2023-01-25 Update (ARP & OPN1 workunits)

Thanks for informing us volunteers!

Cyclops:

On Monday afternoon, many volunteers reported receiving new ARP1 and OPN1 workunits. These workunits are not from a new batch; these are older WUs that were never sent out due to an overloaded server causing problems in our workunit-distribution process. ARP1 and OPN1/OPNG teams remain on temporary pause, preparing new workunits.

Does the last sentence ("teams … preparing new workunits") also apply to ARP1?
I can imagine it only applies to OPN1/OPNG. ARP1-workunits are generated from the previous generation, unless they error out and get stuck, isn't it?
So, as soon as an ARP1-workunit has been declared Valid, you can generate the next generation on the server and there's no need for the ARP1-team to "remain on temporary pause", unless the ARP1-team still isn't ready for downloading Valid results, of course.
Is the ARP1-researchteam ready yet or is the ARP team still finalizing storage issues (see your post 681390)?

In addition, this infusion of about 2 million WUs helped us to confirm that the networking/download issues we have in the data center persist under a normal load.

Is it my imagination or have the transient HTTP errors already mostly disappeared? Since 10:44 UTC and after downloading 70 tasks (OPN1, MCM1) in 37 transfer-sessions I haven't seen any HTTP error. It has become a common experience: after a few days, after an outage, the (transient) HTTP errors are disappearing.
In my experience, this also happens when all ARP1-workunits from their current generation have been sent while no new generation is being generated; in other words, once all ARP1-workunits have been sent and distributed, after turning in the computed result no new generations will be generated and the distribution of new ARP1-tasks dries out eventually.
Having said this, I haven't seen any new ARP1-tasks since 06:00 UTC this morning after turning in 13 ARP1-tasks during the past ten hours (at 16:11, 16:08, 15:43, 15:40, 15:32, 14:29, 14:24, 14:08, 13:33, 13:24, 12:54, 10:09 and 07:43 UTC).
Lately, it is also a common experience that once the distribution of ARP1-tasks has completely dwindled down/dried out and a fresh restart of about 35,000 new generations happen, the HTTP errors rear their ugly heads again.

Finally, Cyclops, back in December you wrote (in post 680326) that you were thinking of starting to crunch in January. Have you had any luck yet installing BOINC?

[Jan 26, 2023 4:32:56 PM]

Cyclops
Senior Cruncher
Joined: Jun 13, 2022
Post Count: 295
Status: Offline


Re: 2023-01-25 Update (ARP & OPN1 workunits)

Hi adriverhoef,

You're right about that, we should have been a bit more clear that "preparing new workunits" does not apply to ARP1. It would be more accurate to say that they are all on pause to varying degrees.

Is the ARP1-researchteam ready yet or is the ARP team still finalizing storage issues (see your post 681390)?

The ARP team is still working on their storage and will tell us when they are ready to send out new workunits.

The decrease in errors is likely because not all clients are asking for new workunits, some are processing existing ones, which puts less strain on the server. when EVERYONE is downloading new units, then it becomes much more congested (like we saw when a lot of ARP/OPN units were downloaded earlier this week). We are working to improve our server so that even at the height of activity on our servers, HTTP errors won't happen to such a degree.

Finally, Cyclops, back in December you wrote (in post 680326) that you were thinking of starting to crunch in January. Have you had any luck yet installing BOINC?

Thanks for asking, I did start crunching at the beginning of January. My progress isn't available yet since I asked the tech team to use my device as a testing ground to solve the ongoing missing devices/results situation.

[Jan 26, 2023 7:54:16 PM]

bfmorse
Senior Cruncher
US
Joined: Jul 26, 2009
Post Count: 296
Status: Offline
Project Badges:

14 day badge for Help Fight Childhood Cancer

14 day badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Computing for Clean Water

180 day badge for FightAIDS@Home - Phase 2

180 day badge for Microbiome Immunity Project

20 year badge for OpenPandemics - COVID-19


Re: 2023-01-25 Update (ARP & OPN1 workunits)

Cyclops,

Have they made any progress on our systems, as you may recall one of my recently added systems has also been volunteered for the same purpose.

[Jan 26, 2023 10:29:06 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12349
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

45 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

5 year badge for The Clean Energy Project - Phase 2

90 day badge for Computing for Clean Water

180 day badge for GO Fight Against Malaria

20 year badge for Mapping Cancer Markers

5 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

5 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: 2023-01-25 Update (ARP & OPN1 workunits)

Cyclops, I presume that the recovery of download times is due to the servers running out of ARP1 units.

Now that you have cleared out those delayed units, will you be attempting to restart the extreme and accelerated units that have been stuck for some time. IBM managed to get previously stuck units going again by reducing the timestep from 36 seconds to 24 seconds. This applies especially to the 3 units stuck in generations 14, 16 & 17, otherwise known as ultra extremes.

Mike

[Jan 26, 2023 11:23:00 PM]