World Community Grid - View Thread - Project Status (First Post Updated)

World Community Grid Forums

Category: Community

Forum: Chat Room

Thread: Project Status (First Post Updated)

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 118

[ ]

Author

This topic has been viewed 15145 times and has 117 replies

thunder7
Senior Cruncher
Netherlands
Joined: Mar 6, 2013
Post Count: 232
Status: Offline
Project Badges:

180 day badge for The Clean Energy Project - Phase 2

180 day badge for Drug Search for Leishmaniasis

90 day badge for GO Fight Against Malaria

200 year badge for Mapping Cancer Markers

1 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

50 year badge for FightAIDS@Home - Phase 2

10 year badge for Smash Childhood Cancer

50 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: Project Status (First Post Updated)

Still, I'm wondering what algorithm is used to give WU's (or not, as the case may be).

I have 3 linux systems. 2 always have work and full queues, The 3rd often has enough work, with a few WU's queued but will, say twice a week, run out of work for a full night. When I notice that in the morning, I detach and re-attach the project, and suddenly it'll have work again in an hour.

This has been happenng for months now, so I'm starting to disbelieve the 'tt's just the luck of the draw' theory for WU dispersal.

[Dec 27, 2023 7:25:25 AM]

Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7691
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

14 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

2 year badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

10 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

45 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

5 year badge for Uncovering Genome Mysteries

10 year badge for FightAIDS@Home - Phase 2

100 year badge for Smash Childhood Cancer

10 year badge for Microbiome Immunity Project

100 year badge for OpenPandemics - COVID-19


Re: Project Status (First Post Updated)

I have noticed something similar with one of my Linux systems. I have come to suspect the system itself encounters some condition - unknown at the moment- and creates some kind of internal blockage. If I initiate a manual update it takes right off and gets more workunits. As far as I can determine it happens randomly. It is a dedicated system which only runs WCG 24/7.

Cheers

----------------------------------------

Sgt. Joe
*Minnesota Crunchers*

[Dec 27, 2023 2:12:26 PM]

Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 988
Status: Offline
Project Badges:

180 day badge for Smash Childhood Cancer

45 day badge for Microbiome Immunity Project

1 year badge for Africa Rainfall Project

1 year badge for OpenPandemics - COVID-19


Re: Project Status (First Post Updated)

For me... I need to manually request now and then. The gap between asking gets longer and longer the more I'm unsuccessful, so the manual request ups the frequency of asks again. I know some people have a script to keep the asks every X minutes, and this helps them consistently get WUs. It might also increase the load on the machine so I don't think all of us should do that.

I find the data that some of your machines do fine, and one has issues interesting. What is different about that machine?? Back in the Seti days it seemed that the slower machines could get WUs easier when they were scarce. It probably was just a perception as few seti WUs could give a slow machine a nice queue, but be done in a blip on a fast machine and not noticed.

[Dec 27, 2023 4:53:33 PM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 977
Status: Recently Active
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

14 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

5 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Project Status (First Post Updated)

Unixchick,

Nice summary of the client behaviour!

It's worth noting that when the client reports completed work it is not guaranteed that it will ask for new work -- that can be quite annoying if a user with lots of threads is depending on reports to also be requests. In my view, there is still room for improvement in client behaviour regarding requests :-)

Also (regarding "luck of the draw"), I suspect that if lots of simultaneous requests for work go in, smaller requests are more likely to be satisfied (less iterations in the work-issuing loop!)

So I suspect that folks who set up for small caches and fast turnaround may be less likely to run out of work, especially if they prod the client every 15, 20 or 30 minutes to wake up the request mechanism. However, large caches and prodding the client every 5 (or less) minutes may be counter-productive!

Cheers - Al.

P.S. I'd rather return work promptly and run the risk of running out than maintain the sort of cache size that some users seem to do (accounting for the 10 to 15% of my MCM1 tasks that see at least one wingman missing deadline on a typical day...) -- big caches for huge machines, yes (though trying to fetch 1000 tasks at once may be doomed to failure); 5+ day caches for smaller machines, no :-)

----------------------------------------
[Edit 1 times, last edit by alanb1951 at Dec 27, 2023 5:52:45 PM]

[Dec 27, 2023 5:47:58 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12431
Status: Offline
Project Badges:

45 day badge for Discovering Dengue Drugs - Together

14 day badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

90 day badge for Computing for Clean Water

45 day badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

5 year badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project


Re: Project Status (First Post Updated)

Yes, there are lots of quirks in the system. And they have mostly been reported many times. But I have been seeing one that I haven't noticed on the fora.

When I get a resend with a 3-day deadline, it doesn't get done in deadline order. It gets done when the equivalent normal units get done. This doesn't mean it misses the deadline because I don't carry a large cache. I would have thought that they would be done in deadline order.

Mike

[Dec 27, 2023 7:12:12 PM]

Bryn Mawr
Senior Cruncher
Joined: Dec 26, 2018
Post Count: 346
Status: Offline
Project Badges:

14 day badge for FightAIDS@Home - Phase 2

14 day badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: Project Status (First Post Updated)

Yes, there are lots of quirks in the system. And they have mostly been reported many times. But I have been seeing one that I haven't noticed on the fora.

When I get a resend with a 3-day deadline, it doesn't get done in deadline order. It gets done when the equivalent normal units get done. This doesn't mean it misses the deadline because I don't carry a large cache. I would have thought that they would be done in deadline order.

Mike

Only when the tasks are put into high priority mode which only happens when they are seen to be in danger of missing their deadline.

[Dec 27, 2023 7:24:13 PM]

Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 988
Status: Offline
Project Badges:


Re: Project Status (First Post Updated)

I'm still stuck on something that was discussed on Page 3 of this thread... Why have they decreased the amount of WUs going out for MCM since Nov 27??

https://www.worldcommunitygrid.org/stat/viewP...&numRecordsPerPage=90

They used to send out a bunch more WUs for MCM... then SOMETHING happened. The system got wonky, the website crashed a bit more, and I guess they cut way back on the WUs going out to keep the system stable. We used to return 1.3 million in a day... now it is low 800k. No wonder so many of us have empty caches.

I worry that they won't be able to bring the other projects back. ARP has always taxed this current configuration. I'm preparing myself now for a bumpy ride.

[Dec 28, 2023 4:50:20 AM]

thunder7
Senior Cruncher
Netherlands
Joined: Mar 6, 2013
Post Count: 232
Status: Offline
Project Badges:


Re: Project Status (First Post Updated)

First of all, the hardware is different,

the good one has 2 Xeon 2696 V2 cpu's (48 threads, 4023 Whetstone, 88393 Dhrystone), 384 Gb memory
the bad one has 2 Xeon 2696 V4 cpu's (88 threads, 4923 Whetstone, 78313 Dhrystone), 128 Gb memory
Both are available for 100% for WCG.

The good one consistently runs 48 WU's and has a buffer containint 950-1000 wu's.

It runs 350-400 WU's per day:

2023-12-27 0:047:22:19:00 203531 385
2023-12-26 0:044:03:29:52 188538 358
2023-12-25 0:048:10:42:13 202057 390
2023-12-24 0:048:03:43:20 210460 400
2023-12-23 0:045:11:33:05 196557 378
2023-12-22 0:045:11:53:00 193925 366
2023-12-21 0:047:21:24:56 203064 388

The bad one never has more than 200 in the buffer, often only 30-50 and runs out of work frequently. Yesterday for example was a bad day:

2023-12-27 0:040:12:26:34 203124 387
2023-12-26 0:081:17:47:44 358064 711
2023-12-25 0:061:17:06:47 270355 512
2023-12-24 0:079:07:05:42 346041 649
2023-12-23 0:064:18:53:11 283429 545
2023-12-22 0:046:18:23:06 218234 429
2023-12-21 0:074:22:09:40 319908 607

There is no difference in global_prefs.xml or global_prefs_override.xml except for the max_cpus value, of course. Both have the same profile in the device manager (default, runs for maximum output). The only thing I can find is that the good one was, according to the device manager, installed in November 2022, and the bad one was installed in November 2023 (well, it got a new motherboard then).

All in all, I don't see a cause for the significant difference here.

[Dec 28, 2023 5:20:01 AM]

Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 988
Status: Offline
Project Badges:


Re: Project Status (First Post Updated)

Thanks for the info Thunder7. Wish it had given us a clue, but still worth looking at just in case we found a reason.

[Dec 28, 2023 6:24:16 AM]

Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7691
Status: Offline
Project Badges:


Re: Project Status (First Post Updated)

Thunder 7,

In the profile you are using, have you set the project limits to "unlimited" ? I have a 32 thread machine with a 2 day cache set that way. It works better than the setting of 64 and does not overload the machine.

Cheers

----------------------------------------

Sgt. Joe
*Minnesota Crunchers*

[Dec 28, 2023 3:20:35 PM]

[ ]