Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 54
Posts: 54   Pages: 6   [ 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 67047 times and has 53 replies Next Thread
Richard Haselgrove
Senior Cruncher
United Kingdom
Joined: Feb 19, 2021
Post Count: 360
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Download issues with data files

I have been allocated a number of OPNG tasks today - this batch, for Intel GPU under Windows.

The allocation is received OK, but BOINC struggles to download the associated data files.

The server response is

19/08/2022 15:00:50 | World Community Grid | [http] [ID#8341] Received header from server: HTTP/1.0 503 Service Unavailable
19/08/2022 15:00:50 | World Community Grid | [http] [ID#8341] Received header from server: No server is available to handle this request.

and multiple retries are needed.
[Aug 19, 2022 2:07:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1948
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Download issues with data files

I have been allocated a number of OPNG tasks today - this batch, for Intel GPU under Windows.

The allocation is received OK, but BOINC struggles to download the associated data files.

The server response is

19/08/2022 15:00:50 | World Community Grid | [http] [ID#8341] Received header from server: HTTP/1.0 503 Service Unavailable
19/08/2022 15:00:50 | World Community Grid | [http] [ID#8341] Received header from server: No server is available to handle this request.

and multiple retries are needed.
Yes, that's what everybody is dealing with. Looks like Krembil's hardware is not up to snuff to handle the workload of uploading the WUs. Happens to all of us... sad

Ralf
----------------------------------------

[Aug 19, 2022 2:32:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Richard Haselgrove
Senior Cruncher
United Kingdom
Joined: Feb 19, 2021
Post Count: 360
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Download issues with data files

It's the first time I've been allocated enough tasks to notice since the project restart, If they're ramping up the work creation, they need to beef up, or fine tune, the download servers at the same rate.

Exactly the same error message occurs under Linux, but Linux seems better at holding on to an open connection, and re-using it, once it's been established. That might be one specific detail to explore.
[Aug 19, 2022 2:42:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 946
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Download issues with data files

I'm hoping that the fact that they have started sending us more tasks is so a network tech can work on the issues. You can't find the leaks in the hose if the water is turned off. The OPNGs are now going out, so we can check that issue off the list.

Again, it would be nice if we knew what was going on officially instead of guessing.
[Aug 19, 2022 2:48:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1948
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Download issues with data files

Exactly the same error message occurs under Linux, but Linux seems better at holding on to an open connection, and re-using it, once it's been established. That might be one specific detail to explore.
Well, no. As those errors are https errors, clearly defined protocol errors, and they are the same regardless of which OS is involved on either side...

Ralf
----------------------------------------

[Aug 19, 2022 3:02:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Richard Haselgrove
Senior Cruncher
United Kingdom
Joined: Feb 19, 2021
Post Count: 360
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Download issues with data files

One other group of messages that made me make that remark was

19/08/2022 15:21:16 | World Community Grid | [http] [ID#21847] Info: Too old connection (133 seconds), disconnect it
19/08/2022 15:21:16 | World Community Grid | [http] [ID#21847] Info: Connection 19130 seems to be dead!
19/08/2022 15:21:16 | World Community Grid | [http] [ID#21847] Info: Closing connection 19130
19/08/2022 15:21:16 | World Community Grid | [http] [ID#21847] Info: Found bundle for host download.worldcommunitygrid.org: 0x55951fe30e40 [can multiplex]
19/08/2022 15:21:16 | World Community Grid | [http] [ID#21847] Info: Re-using existing connection! (#19129) with host download.worldcommunitygrid.org
19/08/2022 15:21:16 | World Community Grid | [http] [ID#21847] Info: Connected to download.worldcommunitygrid.org (199.241.167.118) port 443 (#19129)
19/08/2022 15:21:16 | World Community Grid | [http] [ID#21847] Info: Using Stream ID: 53 (easy handle 0x55951fd74840)

Disconnecting after 133 seconds when it is still needed seems inefficient. It's all about how well the HTTPS tools are being used. Windows seems to drop connections after 20 seconds, which is even worse.
[Aug 19, 2022 3:14:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1948
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Download issues with data files

One other group of messages that made me make that remark was

19/08/2022 15:21:16 | World Community Grid | [http] [ID#21847] Info: Too old connection (133 seconds), disconnect it
19/08/2022 15:21:16 | World Community Grid | [http] [ID#21847] Info: Connection 19130 seems to be dead!
19/08/2022 15:21:16 | World Community Grid | [http] [ID#21847] Info: Closing connection 19130
19/08/2022 15:21:16 | World Community Grid | [http] [ID#21847] Info: Found bundle for host download.worldcommunitygrid.org: 0x55951fe30e40 [can multiplex]
19/08/2022 15:21:16 | World Community Grid | [http] [ID#21847] Info: Re-using existing connection! (#19129) with host download.worldcommunitygrid.org
19/08/2022 15:21:16 | World Community Grid | [http] [ID#21847] Info: Connected to download.worldcommunitygrid.org (199.241.167.118) port 443 (#19129)
19/08/2022 15:21:16 | World Community Grid | [http] [ID#21847] Info: Using Stream ID: 53 (easy handle 0x55951fd74840)

Disconnecting after 133 seconds when it is still needed seems inefficient. It's all about how well the HTTPS tools are being used. Windows seems to drop connections after 20 seconds, which is even worse.
Those are not http errors but probably from the BOINC client, referring to BOINC transfer protocol issues.

By and large, the current problems that we all experience do not have anything to do with the OS involved, and I am pretty sure that the WCG stuff is running on Linux based servers...
----------------------------------------

[Aug 19, 2022 3:39:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
narf57
Cruncher
Joined: Dec 19, 2014
Post Count: 3
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Download issues with data files

Also getting lots of new WU, but all of them have transient HTTP errors, and require multiple retries to download. At least I now have 22 WU in the queue.
[Aug 19, 2022 4:05:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Richard Haselgrove
Senior Cruncher
United Kingdom
Joined: Feb 19, 2021
Post Count: 360
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Download issues with data files

Well, there's HTTP, and there's HTTPS. HTTPS has much greater overheads in establishing each separate connection, which probably limits the number of concurrent connects to any one given server.

We're getting into network wrangling here, and that's an extreme discipline even within the general area of data management. In going on about it, I'm drawing on 15 years' experience computing with BOINC, and during that time it's become clear to me that even the most experienced BOINC project administrators and server operators have very little direct knowledge of the BOINC client behaviour as seen from our end of the cable. [I once heard an enthusiastic and emphatic 'hear, hear' down the line, when I made a statement like that on a BOINC teleconference. It came from one of the most experienced BOINC project administrators of all, overseeing a network that was comparably busy or even busier than WCG at its peak.]

I'm simply trying to give the Krembil team some experience by proxy of what we can see here, and what they might want to think about. And I think I've said enough for now.
[Aug 19, 2022 4:07:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Ian-n-Steve C.
Senior Cruncher
United States
Joined: May 15, 2020
Post Count: 180
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Download issues with data files

they get through eventually if you just keep retrying them.

for example, this worked fine to hammer through my stuck transfers on linux using the boinccmd tool.

watch -n 30 ./boinccmd --network_available


once the files downloaded, the tasks processes and uploaded normally.
----------------------------------------

EPYC 7V12 / [5] RTX A4000
EPYC 7B12 / [5] RTX 3080Ti + [2] RTX 2080Ti
EPYC 7B12 / [6] RTX 3070Ti + [2] RTX 3060
[2] EPYC 7642 / [2] RTX 2080Ti
[Aug 19, 2022 4:33:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 54   Pages: 6   [ 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread