Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 3195
Posts: 3195   Pages: 320   [ Previous Page | 281 282 283 284 285 286 287 288 289 290 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2710231 times and has 3194 replies Next Thread
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1948
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Well, ....

I seriously questioning the wisdom of sending out resends with a 36h deadline, when it might take 24h (or more) just to get the WU downloaded, if you just can't baby sit each and every workstation and manually retry the downloads... :(

And it seems that the download of MCM1 WUs get hung up more often than not as well, so that will be a rough holiday season in WCG land... sad

Ralf

I don't worry about the due dates too much as the odds are pretty slim that the next WU would get downloaded and executed before you return yours even though yours would be late
You are missing the point here.
If a resend WU (_3 and higher) with a 36h deadline takes 24h just to download, that means that you have left only 12h to actually crunch that WU. And most workstations will take more than 12h to process that one single ARP1 WU, resulting in the whole work being for naught and the eventually returned result being labeled as "too late" and thus is not counted.
This happened for example with the very first WU I got from the "trial batch" before the start of the weekend....

Ralf
----------------------------------------

[Nov 4, 2024 5:05:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
catchercradle
Advanced Cruncher
Joined: Jan 16, 2009
Post Count: 126
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

I am inclined to agree. Several hours ago a bunch of tasks started to download and I still have a bunch of files (over 40) not downloading or only downloading one at a time if I keep hitting the retry pending transfers button. Days when I work, that won't be possible so most of the time, nothing will be downloading and I can see tasks passing the deadline without even starting to crunch. Download speeds are mostly even lower than my bored band upload speed of 100KB/s. I get they are working with what they have but it seems they really don't have the infrastructure to run this project.
[Nov 4, 2024 5:43:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
savas
Cruncher
Joined: Sep 21, 2021
Post Count: 34
Status: Offline
Reply to this Post  Reply with Quote 
Re: Work Available

We will be extending the deadlines on all workunits ASAP to account for these issues while we investigate and attempt to solve the problem.
[Nov 4, 2024 5:56:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
gj82854
Advanced Cruncher
Joined: Sep 26, 2022
Post Count: 102
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Well, ....

I seriously questioning the wisdom of sending out resends with a 36h deadline, when it might take 24h (or more) just to get the WU downloaded, if you just can't baby sit each and every workstation and manually retry the downloads... :(

And it seems that the download of MCM1 WUs get hung up more often than not as well, so that will be a rough holiday season in WCG land... sad

Ralf

I don't worry about the due dates too much as the odds are pretty slim that the next WU would get downloaded and executed before you return yours even though yours would be late
You are missing the point here.
If a resend WU (_3 and higher) with a 36h deadline takes 24h just to download, that means that you have left only 12h to actually crunch that WU. And most workstations will take more than 12h to process that one single ARP1 WU, resulting in the whole work being for naught and the eventually returned result being labeled as "too late" and thus is not counted.
This happened for example with the very first WU I got from the "trial batch" before the start of the weekend....

Ralf

I had a _2 WU for batch 124 with a 36 hour return deadline. I didn't return that WU until 3 hours after the deadline and it validated and was counted. The reason was due to the _3 WU never downloaded and executed before mine was returned. Therefore, the server probably cancelled it as not needed.
[Nov 4, 2024 6:03:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
imakuni
Advanced Cruncher
Joined: Jun 11, 2009
Post Count: 103
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Several hours ago a bunch of tasks started to download and I still have a bunch of files (over 40) not downloading or only downloading one at a time if I keep hitting the retry pending transfers button. Days when I work, that won't be possible so most of the time, nothing will be downloading and I can see tasks passing the deadline without even starting to crunch. Download speeds are mostly even lower than my bored band upload speed of 100KB/s. I get they are working with what they have but it seems they really don't have the infrastructure to run this project.

Use an "autoclicker", aka a script that has boinc retry network communications more often. Remember, the more tasks a single user has (as long as they can complete them, ofc), the better.

We will be extending the deadlines on all workunits ASAP to account for these issues while we investigate and attempt to solve the problem.

The solution is pretty simple, actually: WCG does not have the resources for full production of ARP, so you need to severely throttle work generation and distribution. Limit how many jobs are ready to be sent (say... 10 at a time?) and prevent the generation of new units unless network usage is a below a certain threshold.
----------------------------------------

Want to have an image of yourself like this on? Check this thread: https://secure.worldcommunitygrid.org/forums/wcg/viewthread_thread,29840
[Nov 4, 2024 6:17:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12349
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

I received 8 units to download over 8 hours ago. I have downloaded about a third of the files but haven't completely downloaded a whole unit, so I have not been able to start crunching any of them.

I suspect I got 8 because that is how many threads are on the machine.

If we were to be sent a smaller number at a time we might be able to start crunching sooner.

Mike
[Nov 4, 2024 6:34:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 946
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

I got sent 4 WUs even though I can only run 2 at a time. I forced 2 to be downloaded manually to run now, butt I'm letting the other 2 backoff and download by the system. Hopefully the other 2 will download before I finish the 2 running now (16 hours). I wish the download files were more logically named, so it was easier to force one to download completely.

They could set everyone to just download one WU at a time. you could have a longer queue, but just download one at a time. I know someone said it was better to have one person HOG them all, as this would only be two connections, but I'm not convinced that is good.

Won't this bottleneck ease as we all get some WUs?? I'm barely knocking as I just have a few files left to download and they have long backoff times.

Also, there is a limit, as the next gen is based on previous gen, so only so many ARP WUs can be out at a time. We are just getting a full batch at this moment, but it will be more spread out as they are returned and new WUs are made.
----------------------------------------
[Edit 2 times, last edit by Unixchick at Nov 4, 2024 6:58:10 PM]
[Nov 4, 2024 6:50:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1948
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

We will be extending the deadlines on all workunits ASAP to account for these issues while we investigate and attempt to solve the problem.
Thanks!

Ralf
----------------------------------------

[Nov 4, 2024 7:33:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1948
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

The solution is pretty simple, actually: WCG does not have the resources for full production of ARP, so you need to severely throttle work generation and distribution. Limit how many jobs are ready to be sent (say... 10 at a time?) and prevent the generation of new units unless network usage is a below a certain threshold.
Well, you're auto-clicker scripts are just making things worse. As the basic issue is not only the net bandwidth but there are also limitations on the numbers of concurrent connections. And the later will be flooded when everyone is issuing automated retry request far to quick/often...

Ralf
----------------------------------------

[Nov 4, 2024 7:36:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1948
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

I received 8 units to download over 8 hours ago. I have downloaded about a third of the files but haven't completely downloaded a whole unit, so I have not been able to start crunching any of them.

I suspect I got 8 because that is how many threads are on the machine.

If we were to be sent a smaller number at a time we might be able to start crunching sooner.

Mike
But that means that you have "messed" with the default settings for ARP, as the default is one WU per host, not per thread.
I have multiple hosts with 10 cores/20 threads as well as a couple of older ones (or low power laptops) that have only 2, and all of them are just receiving 1 ARP1 WU. All of my hosts are on default settings...

Ralf
----------------------------------------

[Nov 4, 2024 7:39:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 3195   Pages: 320   [ Previous Page | 281 282 283 284 285 286 287 288 289 290 | Next Page ]
[ Jump to Last Post ]
Post new Thread