Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 3174
|
![]() |
Author |
|
Speedy51
Veteran Cruncher New Zealand Joined: Nov 4, 2005 Post Count: 1286 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
If you finish your ARPs in the allotted time, then no worries. I completely agree and I know that I am not causing a bunch of resends. Since I am not processing any work it is slowing the process down imho because they have 1 less reliable host processing work However since Mike has said it is not recommended to run the project if not able to run 24/7 why would I waste resources? Hence the reason why I have asked Mike to back up his claim on page 147. That it is not recommended to run ARP you do not run 24/7. ![]() [Edit 1 times, last edit by Speedy51 at Jan 26, 2025 3:08:36 AM] |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12340 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The recommendation stems frrom the early days of ARP. In those days most machines finished units in 15-27 hours.
If you shutdown a machine, any crunching after the most recent checkpoint is lost, which slows down the completion of the units. Now, some machines can finish units in as little as 8 hours so would not have the same problems with shutdowns, although prolonged shutdowns for those units which have deadlines of 36 hours could still have problems. 3 or 6 day deadlines would be less of a problem except for the many slower machines still crunching here. Mine take 24 hours but crunch 24/7. However, we have had problem units which have had their TimeStep shortened in order to overcome compatibility problems. Halving the TimeStep results in a doubling of the calculations required. So, basically it is a matter of how fast your machine is, how long the shutdowns are and how long the deadlines are. The 24/7 recommendation is only a broad brush approach. See how you get on but be warned that deadlines might be exceeded if you shutdown. The alternative is to use hibernate which retains the crunching after the last checkpoint. Mike |
||
|
Speedy51
Veteran Cruncher New Zealand Joined: Nov 4, 2005 Post Count: 1286 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
So, basically it is a matter of how fast your machine is, how long the shutdowns are and how long the deadlines are. The 24/7 recommendation is only a broad brush approach. See how you get on but be warned that deadlines might be exceeded if you shutdown. Thank you for clarifying. I agree with this, Shane it is not in your original post I am going to stick to my original decision and no longer work for this project because of what was said in your first post (link in first paragraph) In the event this advice changes my point of view may also change ![]() |
||
|
catchercradle
Advanced Cruncher Joined: Jan 16, 2009 Post Count: 125 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The other point is that, historically, the code ARP is based on is intolerant of restarts and will crash if not kept in memory. I mostly use hibernate or sleep rather than shutting down completely and don't seem to lose tasks doing that. I do find that if when shutting down, I pause tasks and wait long enough to be sure they have stopped, then shut down the client before shutting the machine down it gets the fails down to under one in 50. I do the same if I ever have to shut down with CPDN which has had similar problems in the past though recent changes to the code for some models have greatly reduced the problem. |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12340 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Sunday Report
This report has been rebased since the restart earlier this week. Any forecasts will be based on average throughput for the weeks availailable starting this week until 5 weeks are available whence it will be based.on the last 5 weeks. There are still 3 ultras in generations 21 & 22. The other classifications have stayed put.. There are now 634 extremes in generations 104 to 131. The decrease of 2,073 being due to some now being classified as Accelerated. There are now 5,675 accelerated units in generations 132 to 136 and 29,297 normal units in generations 137 to 146. The highest generation to have had validations is 145. Based on this week, we would complete ARP1 in 2028. Mike |
||
|
MJH333
Senior Cruncher England Joined: Apr 3, 2021 Post Count: 266 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
Many thanks for the report, Mike.
It is good to see the progress on the Extremes. I am puzzled as to why there is no progress on the 3 ultra Extremes in generations 21 and 22. I assume that WCG hasn't released these units. I wonder why not? Cheers, Mark |
||
|
geophi
Advanced Cruncher U.S. Joined: Sep 3, 2007 Post Count: 102 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Or, it's possible, these are work units that suffer instability and error out, and the timestep duration will need to be decreased in order to get them past those generations.
|
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12340 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
There has never been any explanation for the 3 ultras.
Reducing the TimeStep has worked in other cases. Maybe they haven't tried a short enough TimeStep. I suspect that they have restarted with the main extremes to start slowly and help them to catch up. Mike |
||
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 935 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
I fear they have given up on the ultras.
I'm now getting gen 133 ARPs. I suggest they go back to issuing the lower generations again once they finish the 136s. Let's see if we can get all WUs in the normal group. |
||
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 935 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
It looks like fewer ARP WUs are going out. You can see the decline in the number of results returned in the stats on the project itself as well as the status thread. The number in the image was up to 600s, but now it is in the 400s. I'm still getting a few ARP WUs here and there, so I don't see the decline personally, just in the data.
|
||
|
|
![]() |