Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 3195
|
![]() |
Author |
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1948 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Well, .... I seriously questioning the wisdom of sending out resends with a 36h deadline, when it might take 24h (or more) just to get the WU downloaded, if you just can't baby sit each and every workstation and manually retry the downloads... :( And it seems that the download of MCM1 WUs get hung up more often than not as well, so that will be a rough holiday season in WCG land... ![]() Ralf I don't worry about the due dates too much as the odds are pretty slim that the next WU would get downloaded and executed before you return yours even though yours would be late If a resend WU (_3 and higher) with a 36h deadline takes 24h just to download, that means that you have left only 12h to actually crunch that WU. And most workstations will take more than 12h to process that one single ARP1 WU, resulting in the whole work being for naught and the eventually returned result being labeled as "too late" and thus is not counted. This happened for example with the very first WU I got from the "trial batch" before the start of the weekend.... Ralf ![]() |
||
|
catchercradle
Advanced Cruncher Joined: Jan 16, 2009 Post Count: 126 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I am inclined to agree. Several hours ago a bunch of tasks started to download and I still have a bunch of files (over 40) not downloading or only downloading one at a time if I keep hitting the retry pending transfers button. Days when I work, that won't be possible so most of the time, nothing will be downloading and I can see tasks passing the deadline without even starting to crunch. Download speeds are mostly even lower than my bored band upload speed of 100KB/s. I get they are working with what they have but it seems they really don't have the infrastructure to run this project.
|
||
|
savas
Cruncher Joined: Sep 21, 2021 Post Count: 34 Status: Offline |
We will be extending the deadlines on all workunits ASAP to account for these issues while we investigate and attempt to solve the problem.
|
||
|
gj82854
Advanced Cruncher Joined: Sep 26, 2022 Post Count: 102 Status: Offline Project Badges: ![]() ![]() |
Well, .... I seriously questioning the wisdom of sending out resends with a 36h deadline, when it might take 24h (or more) just to get the WU downloaded, if you just can't baby sit each and every workstation and manually retry the downloads... :( And it seems that the download of MCM1 WUs get hung up more often than not as well, so that will be a rough holiday season in WCG land... ![]() Ralf I don't worry about the due dates too much as the odds are pretty slim that the next WU would get downloaded and executed before you return yours even though yours would be late If a resend WU (_3 and higher) with a 36h deadline takes 24h just to download, that means that you have left only 12h to actually crunch that WU. And most workstations will take more than 12h to process that one single ARP1 WU, resulting in the whole work being for naught and the eventually returned result being labeled as "too late" and thus is not counted. This happened for example with the very first WU I got from the "trial batch" before the start of the weekend.... Ralf I had a _2 WU for batch 124 with a 36 hour return deadline. I didn't return that WU until 3 hours after the deadline and it validated and was counted. The reason was due to the _3 WU never downloaded and executed before mine was returned. Therefore, the server probably cancelled it as not needed. |
||
|
imakuni
Advanced Cruncher Joined: Jun 11, 2009 Post Count: 103 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Several hours ago a bunch of tasks started to download and I still have a bunch of files (over 40) not downloading or only downloading one at a time if I keep hitting the retry pending transfers button. Days when I work, that won't be possible so most of the time, nothing will be downloading and I can see tasks passing the deadline without even starting to crunch. Download speeds are mostly even lower than my bored band upload speed of 100KB/s. I get they are working with what they have but it seems they really don't have the infrastructure to run this project. Use an "autoclicker", aka a script that has boinc retry network communications more often. Remember, the more tasks a single user has (as long as they can complete them, ofc), the better. We will be extending the deadlines on all workunits ASAP to account for these issues while we investigate and attempt to solve the problem. The solution is pretty simple, actually: WCG does not have the resources for full production of ARP, so you need to severely throttle work generation and distribution. Limit how many jobs are ready to be sent (say... 10 at a time?) and prevent the generation of new units unless network usage is a below a certain threshold. ![]() Want to have an image of yourself like this on? Check this thread: https://secure.worldcommunitygrid.org/forums/wcg/viewthread_thread,29840 |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12349 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I received 8 units to download over 8 hours ago. I have downloaded about a third of the files but haven't completely downloaded a whole unit, so I have not been able to start crunching any of them.
I suspect I got 8 because that is how many threads are on the machine. If we were to be sent a smaller number at a time we might be able to start crunching sooner. Mike |
||
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 946 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() |
I got sent 4 WUs even though I can only run 2 at a time. I forced 2 to be downloaded manually to run now, butt I'm letting the other 2 backoff and download by the system. Hopefully the other 2 will download before I finish the 2 running now (16 hours). I wish the download files were more logically named, so it was easier to force one to download completely.
----------------------------------------They could set everyone to just download one WU at a time. you could have a longer queue, but just download one at a time. I know someone said it was better to have one person HOG them all, as this would only be two connections, but I'm not convinced that is good. Won't this bottleneck ease as we all get some WUs?? I'm barely knocking as I just have a few files left to download and they have long backoff times. Also, there is a limit, as the next gen is based on previous gen, so only so many ARP WUs can be out at a time. We are just getting a full batch at this moment, but it will be more spread out as they are returned and new WUs are made. [Edit 2 times, last edit by Unixchick at Nov 4, 2024 6:58:10 PM] |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1948 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
We will be extending the deadlines on all workunits ASAP to account for these issues while we investigate and attempt to solve the problem. Thanks!Ralf ![]() |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1948 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The solution is pretty simple, actually: WCG does not have the resources for full production of ARP, so you need to severely throttle work generation and distribution. Limit how many jobs are ready to be sent (say... 10 at a time?) and prevent the generation of new units unless network usage is a below a certain threshold. Well, you're auto-clicker scripts are just making things worse. As the basic issue is not only the net bandwidth but there are also limitations on the numbers of concurrent connections. And the later will be flooded when everyone is issuing automated retry request far to quick/often...Ralf ![]() |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1948 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I received 8 units to download over 8 hours ago. I have downloaded about a third of the files but haven't completely downloaded a whole unit, so I have not been able to start crunching any of them. But that means that you have "messed" with the default settings for ARP, as the default is one WU per host, not per thread.I suspect I got 8 because that is how many threads are on the machine. If we were to be sent a smaller number at a time we might be able to start crunching sooner. Mike I have multiple hosts with 10 cores/20 threads as well as a couple of older ones (or low power laptops) that have only 2, and all of them are just receiving 1 ARP1 WU. All of my hosts are on default settings... Ralf ![]() |
||
|
|
![]() |