Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 3161
|
![]() |
Author |
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12333 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It is difficult to judge as we have just had a 24 hour hiatus.
However we have nearly completed half of 141 so I suspect it will not be long before they start to release 142. Mike |
||
|
spRocket
Senior Cruncher Joined: Mar 25, 2020 Post Count: 274 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
I managed to pick up some ARP work yesterday, and there's one that came in very early this morning (US Central time). I think it was a general WCG issue, since MCM also had a hiccup over the weekend.
----------------------------------------Edit: All of my currently-processing ARP WUs are _0 and _1. [Edit 1 times, last edit by spRocket at Feb 17, 2025 2:50:44 PM] |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12333 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Out of 849 units listed as being in extreme or accelerated generations, only 350 appear to be available. The rest are probably stuck.
Mike |
||
|
Maxxina
Advanced Cruncher Joined: Jan 5, 2008 Post Count: 124 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
For these extremes they should create a setting .. Send these just to people who had set up that africa is only project that computer work on .. And which they return the units to 24 hours .. So It catch up ..
|
||
|
catchercradle
Advanced Cruncher Joined: Jan 16, 2009 Post Count: 125 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
For these extremes they should create a setting .. Send these just to people who had set up that africa is only project that computer work on .. And which they return the units to 24 hours .. So It catch up .. It is the only project I crunch when nothing available from CPDN. Even when I have work from there I return work here in less than 12 hours. CPDN often has weeks or even months without any work. An alternative would be some way of looking for trusted machines that always return work in a given time. |
||
|
MJH333
Senior Cruncher England Joined: Apr 3, 2021 Post Count: 265 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
For these extremes they should create a setting .. Send these just to people who had set up that africa is only project that computer work on .. And which they return the units to 24 hours .. So It catch up .. It is the only project I crunch when nothing available from CPDN. Even when I have work from there I return work here in less than 12 hours. CPDN often has weeks or even months without any work. An alternative would be some way of looking for trusted machines that always return work in a given time. There already is a mechanism for speeding up the units that have fallen behind. The "normals" are sent out as 2 copies with a 6 day deadline, the "accelerated" also 2 copies but a 3 day deadline and the "extremes" as 3 copies with a 1.5 day deadline. Any extra copies sent because of errors or No Reply from "normal" or "accelerated" units are sent with the original deadline halved. IBM also made a system under which "extreme" and (I think) "accelerated" units were only sent to "reliable" machines. I believe this system still operates, but I am not sure exactly how "reliable" is defined. I guess that a machine becomes "reliable" if it returns a certain number (10?) of valid ARP1 results, and stops being "reliable" if it returns an invalid result (or, perhaps, fails to return a result on time?). The problem Mike.Gibson is concerned about is with units that have become "stuck". In other words, a unit which has not validated successfully but has instead returned the maximum number of permitted error results. We assume that this happens in places where the weather patterns being modelled are unusual in some way - perhaps because the area has mountains. In order to restart "stuck" units, IBM altered the "time_step" for the unit and ran them again. This basically means that the computations are done at a more detailed level, and take longer. Often this got units going. But it seems that quite a few have now become stuck again. I hope this is helpful. Cheers, Mark [Edit 1 times, last edit by MJH333 at Feb 18, 2025 11:39:35 AM] |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12333 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
As Mark says, the main problem now is the stuck units. Unless something is done about them, the project will not finish.
There are 336 units in the extreme category but only about 10 of them are moving. There are also a substantial portion of the accelerated units which are not moving. Mike |
||
|
hchc
Veteran Cruncher USA Joined: Aug 15, 2006 Post Count: 793 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Are WCG staff not tending to the flock of stuck work units?
----------------------------------------
|
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12333 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The stats have not been run today.
Mike |
||
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 927 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() |
I haven't gotten an ARP in 24 hours. my machine is not "reliable" so it could be that there aren't many "normal" units going out.
|
||
|
|
![]() |