Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 3317
|
![]() |
Author |
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1677 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
+1
---------------------------------------- |
||
|
halldor.usa
Advanced Cruncher USA Joined: Nov 24, 2006 Post Count: 115 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks again for the update!
|
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12436 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Another Sunday, so another full report. This one will be the last before shutdown
There have been no validations reported for the last 2 days although there have been a few results returned - 3 yesterday! maybe they were for units already validated. The 3 'new' ultras need to advance by 5 generations per week to catch up to the calculated completion date. They are currently in generations 008, 010 & 011 so have not progressed this week. Only the last of the 'old' ultras is still discernible, now in generation 093. The rest are mixed in with the other unstuck units. There are 117 units in the extreme range, all of which are listed by WCG. The definitions of extreme, accelerated(priority) & normal have remained at 125, 135 & 120, respectively. 3,472 units have validated in the week (down to 496.0 per day) so we now have 1,953,789 remaining until the end of the project. We cracked the 2 million mark. My calculated completion date has been suspended due to the wind down. I now expect the project to end early next year, depending on how many crunchers return after the 2 month break. Enjoy your holidays! Mike |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12436 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Just a suggestion, but when we return, might it not be better to reduce the number of extreme units?
When the category was introduced, there were only about 56 units in the category and some of them were stuck. We now have 117 in the category so there are at least twice as many redundant copies out there. Also 3 units had to be restarted from scratch. I would suggest that the definition for accelerat (prioriy) units remain at -10 generations or move to -15 generations and the definition for extremes to move to -30 generations This would mean 56 in the extremes as before and therefore fewer of the faster machines crunching redundant units. As before, these definitions could close up as the stragglers catch up. Mike |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 982 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I suspect the best way to free up some of the work-load on "faster machines" might be to reduce the normal deadline to five or six days (with or without a grace day) rather than the existing 7+1. Yes, it would upset certain users (not all of them with small/slow systems!) but there are probably quite a lot of retries out there most of the time, and not all because of less powerful systems...
Under normal running conditions my machines were getting 25% or more tasks that had wingmen responding 6.5 days or more after receiving tasks (and the response was often No Reply rather than Error - Not started by deadline). And the vast majority of those late/non-existent replies came from a very small set of systems that never seemed to return work for any project in a timely fashion(*)! Once we got onto clean-up, well over 50% of my retries were for that small subset of devices! Of course, if some way could be found of making sure that the maximum "upper limit" on ARP1 tasks could be set to [say] twice the number of announced CPUs a lot of the late (and non-existent) returns would go away! Cheers - Al. (*) That's based on an analysis of wingman responses for tasks I've run on projects during late January 2022 and February 2022 up to the shutdown... |
||
|
geophi
Advanced Cruncher U.S. Joined: Sep 3, 2007 Post Count: 107 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
IAnd the vast majority of those late/non-existent replies came from a very small set of systems that never seemed to return work for any project in a timely fashion(*)! Once we got onto clean-up, well over 50% of my retries were for that small subset of devices. That's what I saw also. There was a PC with Fedora 34 Server OS that time and time again was late with its units such that I had to crunch them after its deadlines passedt. Sometimes there would be an error, or a No Reply, and often it would return the work, uselessly well after the work unit had been validated. Or, I would be paired with it as one of the 2 units first sent out, and then have to wait 10 days for my task to be validated because it couldn't return work, and it took the reliable PC that picked it up 3 more days to return it. The Fedora 34 PC was the worst, but for a long time there was a CentoOS 8 PC and a Ubuntu Jammy Jellyfish doing similar. They must have been many core systems given how often I saw these PCs late or error because they didn't start the task by some deadline. [Edit 1 times, last edit by geophi at Feb 28, 2022 1:37:49 AM] |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12436 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I use windows and didn't see that sort of problem until the wind down. The problem I saw then could have stemmed from crunchers loading up their machines before the wind down and then having the deadlines cut from 8 to 6 days.
Restricting machines to total threads would be a big help, but the overloaders would not get the extremes or accelerated units as they would not be considered reliable. The plus 1 day was not a good idea as it encouraged people to return slowly. However, it didn't matter too much as they would only be crunching normal units. Mike |
||
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1677 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
My machines are not the youngest, i.e. they need between 16 and 27 hours to complete an ARP1 WU.
----------------------------------------Excepted during these shutting down weeks, I usually keep 1 max 2 ARP1 WUs in the buffer in addition to the WUs being computed. It is easy to set-up using WCG device profile and app_conf.xml locally on the machines. I would as well advocate for limiting the deadline to 5(+1?) day for ARP1 and to 7 days for the other projects. I do not really see the needs for a 10 days delay when WUs require between 2 and 4 hour computation time. The only exception is OPN1 on RPi (3), requiring 12 to 20 hours (sometime 25 hours) computation. Nevertheless, 5-7 days delay is more than sufficient even for such slow machines. Cheers, Yves |
||
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 994 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
I'm so excited to have ARPs again. I have 2 and they are both 125. Does anyone know if the script telling us where all the WUs are is still running? I'm feeling lazy and don't want to hunt back through this thread for the link to the scripts. If it is running and you have the links handy can someone post them again??
|
||
|
TonyEllis
Senior Cruncher Australia Joined: Jul 9, 2008 Post Count: 261 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The only exception is OPN1 on RPi (3), requiring 12 to 20 hours (sometime 25 hours) computation. Nevertheless, 5-7 days delay is more than sufficient even for such slow machines. Cheers, Yves My Pi Zero averages about 58 hours ![]()
Run Time Stats https://grassmere-productions.no-ip.biz/
|
||
|
|
![]() |