World Community Grid - View Thread

World Community Grid Forums

Category: Active Research

Forum: Africa Rainfall Project

Thread: Work Available

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 3204

[ ]

Author

This topic has been viewed 2785254 times and has 3203 replies

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 952
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

14 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Work Available

Thank you, Al.

That is an unstuck unit from adriverhoef's September thread.

Yup; much to my chagrin I missed that one in my scan of that thread because it was in sam6861's forensic analysis post rather than as a failure report per se :-( -- as he'd just reported 33395 I probably misread the forensic one to be the same...

I could only guess that the reason for 32 bit might be because one of the 3 machines is 32 bit.

Mike

I'd assumed that, so I've been looking at wingmen for all my recent 32-bit jobs (rather than just the unstuck tasks...) and most of the time there's nothing obvious about the O/S in use - typically a 64-bit system... Sometimes, however, it's obvious what's going on as there's a 32-bit kernel in play, albeit not that often!

However, I suspect that some of those systems actually have a 32-bit client without the configuration adjustment for 64-bit tasks; that would go a long way to explaining it.

Cheers - Al.

[Feb 4, 2022 5:49:59 PM]

knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding

90 day badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy - Phase 2

90 day badge for Computing for Clean Water

14 day badge for Uncovering Genome Mysteries

45 day badge for Outsmart Ebola Together

180 day badge for FightAIDS@Home - Phase 2

1 year badge for Microbiome Immunity Project

1 year badge for Africa Rainfall Project

180 day badge for OpenPandemics - COVID-19


Re: Work Available

A quick update before we head into the weekend. There were three work units that we have to restart from the beginning. These have now been reset and generation 0 has been sent out. They are the units:

ARP1_0034098
ARP1_0034244
ARP1_0034587

I have one final workunit that I'm rerunning clean jobs on that will get submitted into the grid tomorrow. At that point all of the units will be back running on the grid.*

One change that we have made is that because those three units have so far to catch up we need them to advance quickly. In order to do that we are reducing the report deadline for 'extreme' jobs. This will ensure that those three don't get stuck on a generation for very long and that they have the best chance to catch up with the pack before the project ends. Note that this change will impact all of the 'extreme' jobs. We will monitor things to make sure that we don't see a suddenly spike in jobs not being finished by the deadline due to this change (we don't expect it to a be problem, but we will watch for it anyway).

*Note that the report that counts the number of work-units running as 'extreme' doesn't count the first generation after I put them on the grid. Once the first generation runs successfully, its child will get the flag that makes it run as an extreme job and then it will show up in the report.

[Feb 4, 2022 11:05:17 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12359
Status: Offline
Project Badges:

45 day badge for Discovering Dengue Drugs - Together

14 day badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

45 day badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

5 year badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project


Re: Work Available

Thank you, Kevin.

If you are reducing the deadline for extreme case, are you reducing the definition of reliable to narrow the machines eligible?

As there are only 3 units restarting at 000, could you not have a short list of the fastest machines to receive them? That way they would close up faster.

There are currently only 261 days to my calculated completion date and those 3 units will need to be run through 183 generations, which means they will need to turn around every 1.43 days (34 hours) to meet that date.

I would suggest that only machines regularly returning units within 18 hours should be eligible for those 3 units. The other extremes could continue as now.

That would rule me out, but help the project!

Mike

[Feb 5, 2022 12:33:50 AM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12359
Status: Offline
Project Badges:


Re: Work Available

We now have 3 new ultras in generation 000 which includes the one that was stuck in generation 079.

Only 2 of the old ultras are still discernable.

There will be a reduced reporting deadline for the extremes. This new deadline has not been notified so if anyone picks one up, please post here.

There is an error in the latest daily text file ....stats/state.txt. The max_generation for the extremes should be 118.

There are 123 units in the extreme range but only 114 listed as extreme. This means that there are 9 units which have yet to complete their first generation since being re-started. This includes the 3 in generation 000.

My current calculated completion date forecast for ARP is 22 October so those 3 units which have restarted will have to turn around very quickly (34 hours average for the entire period).

Mike

[Feb 5, 2022 1:33:24 PM]

knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:


Re: Work Available

If you are reducing the deadline for extreme case, are you reducing the definition of reliable to narrow the machines eligible?

No - while I would like to do this, the reliable mechanism in the BOINC code applies to everything on World Community Grid and making this change would cause there to be too many jobs needing a reliable host which would in turn gum up the works.

As there are only 3 units restarting at 000, could you not have a short list of the fastest machines to receive them? That way they would close up faster.

I agree that would be nice, but no such mechanism exists in BOINC.

There are currently only 261 days to my calculated completion date and those 3 units will need to be run through 183 generations, which means they will need to turn around every 1.43 days (34 hours) to meet that date.

That's in line with my estimate. Currently the extremes average 38 hours while the median is 29 hours ( as an aside 62% of the workunits finish within 34 hrs). The difference between the average and the median is the hard luck cases that stretch out to 100+ hours. As long as these three jobs have a minimal number of hard luck cases, then they should be able to be close by the end of project. In the absence of better tools to target these at specific hosts, the next best tool is to make sure that the worst case time to finish a single generation isn't too bad. That way if the average is pushed closer to the median (and in particular, below 34 hours), then they should be able to catch up. Once most of the extreme workunits become accelerated we can look at tightening the report deadline for those three further to be only slightly longer than the reliable threshold.

We have also discussed this with the Delft team and they know that these 3 might take another 3-4 weeks to finish after the bulk are done.

[Feb 5, 2022 2:09:17 PM]

knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:


Re: Work Available

There will be a reduced reporting deadline for the extremes. This new deadline has not been notified so if anyone picks one up, please post here.

The base deadline assigned to the workunit is changed from 7 -> 3.5 days. However, there are some complications due to the way that the deadline is modified based on the reliable mechanism. Most of the jobs are getting assigned a 2.75 day deadline.

[Feb 5, 2022 2:11:53 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12359
Status: Offline
Project Badges:


Re: Work Available

Kevin

Thank you for your replies. I note that 7 divided by 2 is 3.5 and 3.5 divided by 2 is 1.75. Is the 2.75 arrived at by adding the 1.75 and the 1 leeway allowed part way through the project?

Did you find the error on ,,,,stats/state?

Cheers

Mike

[Feb 5, 2022 3:42:45 PM]

Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 951
Status: Offline
Project Badges:

180 day badge for Smash Childhood Cancer

45 day badge for Microbiome Immunity Project

1 year badge for OpenPandemics - COVID-19


Re: Work Available

Thank you Kevin for taking the time to let us know what is going on with the project. I love the updates, and getting to see what gen each of the tasks is on.

Thanks to Mike also for posting quick updates.

I think the best thing we can all do is keep our own queues as small as possible, and keep reminding others to keep their queues small. This project is very stable (unlike seti where a queue was needed) so one really doesn't need much of a queue.

So close to getting my year badge, but I'm going to stick with this project, as it is the only one on WCG with feedback. Hopefully there will be a new project to jump to when this one is done.

[Feb 5, 2022 4:16:32 PM]

Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7660
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

14 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

2 year badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

10 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

45 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

200 year badge for Mapping Cancer Markers

20 year badge for Outsmart Ebola Together

100 year badge for Smash Childhood Cancer

2 year badge for Africa Rainfall Project

100 year badge for OpenPandemics - COVID-19


Re: Work Available

I have ARP1_0000026_123. I don't recall seeing a number as low as 0000026 in a long time. It has a 5 day deadline.
Cheers

----------------------------------------

Sgt. Joe
*Minnesota Crunchers*

[Feb 5, 2022 4:17:30 PM]

Robokapp
Senior Cruncher
Joined: Feb 6, 2012
Post Count: 249
Status: Offline
Project Badges:

2 year badge for Help Fight Childhood Cancer

180 day badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

180 day badge for Computing for Sustainable Water

10 year badge for Africa Rainfall Project


Re: Work Available

woke up this morning to find ARP1_0031093_123_0 and ARP1_0019553_123_2 had failed to initiate. I was squinting at my list "why does it look there's fewer than 8 running" and then I found them when i scrolled down the task list stuck at 0.000% and "high Priority".

I abandoned both so... these two buggers are still in limbo. Sorry all. My little Intel couldn't. :D

[Feb 5, 2022 4:20:13 PM]

[ ]