Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 3204
Posts: 3204   Pages: 321   [ Previous Page | 187 188 189 190 191 192 193 194 195 196 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2785254 times and has 3203 replies Next Thread
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Thank you, Al.

That is an unstuck unit from adriverhoef's September thread.
Yup; much to my chagrin I missed that one in my scan of that thread because it was in sam6861's forensic analysis post rather than as a failure report per se :-( -- as he'd just reported 33395 I probably misread the forensic one to be the same...
I could only guess that the reason for 32 bit might be because one of the 3 machines is 32 bit.

Mike
I'd assumed that, so I've been looking at wingmen for all my recent 32-bit jobs (rather than just the unstuck tasks...) and most of the time there's nothing obvious about the O/S in use - typically a 64-bit system... Sometimes, however, it's obvious what's going on as there's a 32-bit kernel in play, albeit not that often!

However, I suspect that some of those systems actually have a 32-bit client without the configuration adjustment for 64-bit tasks; that would go a long way to explaining it.

Cheers - Al.
[Feb 4, 2022 5:49:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

A quick update before we head into the weekend. There were three work units that we have to restart from the beginning. These have now been reset and generation 0 has been sent out. They are the units:

  • ARP1_0034098
  • ARP1_0034244
  • ARP1_0034587


I have one final workunit that I'm rerunning clean jobs on that will get submitted into the grid tomorrow. At that point all of the units will be back running on the grid.*

One change that we have made is that because those three units have so far to catch up we need them to advance quickly. In order to do that we are reducing the report deadline for 'extreme' jobs. This will ensure that those three don't get stuck on a generation for very long and that they have the best chance to catch up with the pack before the project ends. Note that this change will impact all of the 'extreme' jobs. We will monitor things to make sure that we don't see a suddenly spike in jobs not being finished by the deadline due to this change (we don't expect it to a be problem, but we will watch for it anyway).


*Note that the report that counts the number of work-units running as 'extreme' doesn't count the first generation after I put them on the grid. Once the first generation runs successfully, its child will get the flag that makes it run as an extreme job and then it will show up in the report.
[Feb 4, 2022 11:05:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12359
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Thank you, Kevin.

If you are reducing the deadline for extreme case, are you reducing the definition of reliable to narrow the machines eligible?

As there are only 3 units restarting at 000, could you not have a short list of the fastest machines to receive them? That way they would close up faster.

There are currently only 261 days to my calculated completion date and those 3 units will need to be run through 183 generations, which means they will need to turn around every 1.43 days (34 hours) to meet that date.

I would suggest that only machines regularly returning units within 18 hours should be eligible for those 3 units. The other extremes could continue as now.

That would rule me out, but help the project!

Mike
[Feb 5, 2022 12:33:50 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12359
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

We now have 3 new ultras in generation 000 which includes the one that was stuck in generation 079.

Only 2 of the old ultras are still discernable.

There will be a reduced reporting deadline for the extremes. This new deadline has not been notified so if anyone picks one up, please post here.

There is an error in the latest daily text file ....stats/state.txt. The max_generation for the extremes should be 118.

There are 123 units in the extreme range but only 114 listed as extreme. This means that there are 9 units which have yet to complete their first generation since being re-started. This includes the 3 in generation 000.

My current calculated completion date forecast for ARP is 22 October so those 3 units which have restarted will have to turn around very quickly (34 hours average for the entire period).

Mike
[Feb 5, 2022 1:33:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

If you are reducing the deadline for extreme case, are you reducing the definition of reliable to narrow the machines eligible?


No - while I would like to do this, the reliable mechanism in the BOINC code applies to everything on World Community Grid and making this change would cause there to be too many jobs needing a reliable host which would in turn gum up the works.

As there are only 3 units restarting at 000, could you not have a short list of the fastest machines to receive them? That way they would close up faster.


I agree that would be nice, but no such mechanism exists in BOINC.

There are currently only 261 days to my calculated completion date and those 3 units will need to be run through 183 generations, which means they will need to turn around every 1.43 days (34 hours) to meet that date.


That's in line with my estimate. Currently the extremes average 38 hours while the median is 29 hours ( as an aside 62% of the workunits finish within 34 hrs). The difference between the average and the median is the hard luck cases that stretch out to 100+ hours. As long as these three jobs have a minimal number of hard luck cases, then they should be able to be close by the end of project. In the absence of better tools to target these at specific hosts, the next best tool is to make sure that the worst case time to finish a single generation isn't too bad. That way if the average is pushed closer to the median (and in particular, below 34 hours), then they should be able to catch up. Once most of the extreme workunits become accelerated we can look at tightening the report deadline for those three further to be only slightly longer than the reliable threshold.

We have also discussed this with the Delft team and they know that these 3 might take another 3-4 weeks to finish after the bulk are done.
[Feb 5, 2022 2:09:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

There will be a reduced reporting deadline for the extremes. This new deadline has not been notified so if anyone picks one up, please post here.


The base deadline assigned to the workunit is changed from 7 -> 3.5 days. However, there are some complications due to the way that the deadline is modified based on the reliable mechanism. Most of the jobs are getting assigned a 2.75 day deadline.
[Feb 5, 2022 2:11:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12359
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Kevin

Thank you for your replies. I note that 7 divided by 2 is 3.5 and 3.5 divided by 2 is 1.75. Is the 2.75 arrived at by adding the 1.75 and the 1 leeway allowed part way through the project?

Did you find the error on ,,,,stats/state?

Cheers

Mike
[Feb 5, 2022 3:42:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 951
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Thank you Kevin for taking the time to let us know what is going on with the project. I love the updates, and getting to see what gen each of the tasks is on.

Thanks to Mike also for posting quick updates.

I think the best thing we can all do is keep our own queues as small as possible, and keep reminding others to keep their queues small. This project is very stable (unlike seti where a queue was needed) so one really doesn't need much of a queue.

So close to getting my year badge, but I'm going to stick with this project, as it is the only one on WCG with feedback. Hopefully there will be a new project to jump to when this one is done.
[Feb 5, 2022 4:16:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7660
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

I have ARP1_0000026_123. I don't recall seeing a number as low as 0000026 in a long time. It has a 5 day deadline.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Feb 5, 2022 4:17:30 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Robokapp
Senior Cruncher
Joined: Feb 6, 2012
Post Count: 249
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

woke up this morning to find ARP1_0031093_123_0 and ARP1_0019553_123_2 had failed to initiate. I was squinting at my list "why does it look there's fewer than 8 running" and then I found them when i scrolled down the task list stuck at 0.000% and "high Priority".

I abandoned both so... these two buggers are still in limbo. Sorry all. My little Intel couldn't. :D
[Feb 5, 2022 4:20:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 3204   Pages: 321   [ Previous Page | 187 188 189 190 191 192 193 194 195 196 | Next Page ]
[ Jump to Last Post ]
Post new Thread