Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 3312
Posts: 3312   Pages: 332   [ Previous Page | 257 258 259 260 261 262 263 264 265 266 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3301789 times and has 3311 replies Next Thread
Jarl Ole Hank-Jensen
Senior Cruncher
Norway
Joined: Jun 8, 2019
Post Count: 223
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

[May 6, 2023 5:35:13 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1323
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

I got four 138s. All four failed to download 1 or more files due to several transient HTTP errors.

WU download error: couldn't get input files:
<file_xfer_error>
<file_name>b71989a66747b997eed0d720ba70cf74.7z</file_name>
<error_code>-119 (md5 checksum failed for file)</error_code>


After a download error a zero byte file with the right project filename is created and it looks like that,
when it's retry time, this zero byte file is causing a problem.
When I delete that 0-file before a transfer-retry the problem does not appear.
I'm unfortunately not always there to babysit the transfer process.

Edit: I could save a fifth task by operating like described above The _1 is mine and running.
I had to delete two 0-files that needed a retry download.
I don't have issues with my internet connection.

I needed the same treating for a 6th task. For 2 files even twice.
06 May 09:28:14	Scheduler request completed: got 1 new tasks	
06 May 09:28:22 Temporarily failed download of ARP1_0009184_138_ARP1_0009184_138.input: transient HTTP error
06 May 09:29:54 Temporarily failed download of 11a303958ca095ce1e437405905965b9.: transient HTTP error
06 May 09:34:54 Temporarily failed download of ARP1_0009184_138_ARP1_0009184_input_d01: transient HTTP error
06 May 09:35:38 Temporarily failed download of ARP1_0009184_138_ARP1_0009184_input_d03: transient HTTP error
06 May 09:35:43 Temporarily failed download of ARP1_0009184_138_ARP1_0009184_138.input: transient HTTP error
06 May 09:38:27 Temporarily failed download of ARP1_0009184_138_ARP1_0009184_input_d03: transient HTTP error

----------------------------------------
[Edit 3 times, last edit by Crystal Pellet at May 6, 2023 7:45:40 AM]
[May 6, 2023 7:07:42 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1323
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

I'm unfortunately not always there to babysit the transfer process.
But I had the time to dig into this issue smile
A transfer retry found an already existing file with that name in the project folder mostly with a zero bytes size.
Cause I had for testing purposes a setting in my cc_config.xml <dont_check_file_sizes>1</dont_check_file_sizes> the retransfered file did not overwrite the 0-byte file, resulting in the checksum errors.
[May 6, 2023 1:00:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2171
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Thank you Jarl Ole! Thanks to your contribution I was able to find some other ARP1-workunits below generation 124:
ARP1_0034244_019	    ARP1_0033791_106	ARP1_0033952_117	    ARP1_0034317_120
ARP1_0034098_019 ARP1_0034251_111 ARP1_0033870_118 ARP1_0034319_122
ARP1_0033793_101 ARP1_0034391_113 ARP1_0033558_119 ARP1_0034389_122
ARP1_0034320_104 ARP1_0034247_113 ARP1_0034316_119 ARP1_0034646_123

I chose those — in the Extreme category — because they have less than 10 workunits in their generations (see generations.txt and state.txt).

Adri
[May 6, 2023 1:10:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12434
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Thanks, Guys. It is good to see those extremes moving again hopefully, especially the 3 ultras.

Some have validated already.

Mike
[May 6, 2023 3:15:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
MJH333
Senior Cruncher
England
Joined: Apr 3, 2021
Post Count: 268
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Hi Adri,
I've now picked up the next generation of one of the tasks Jarl Ole mentioned.
See ARP1_0033555_102
Cheers,
Mark
[May 6, 2023 6:04:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 979
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Hi Adri,
I've now picked up the next generation of one of the tasks Jarl Ole mentioned.
See ARP1_0033555_102
Cheers,
Mark

That's very interesting... Your WU was created about half an hour after it became possible to validate the previous generation, so it is [still] possible for a given cell to move up several places in a short time (as used to be the case!)

What still puzzles me if that is so is this -- why have there been far fewer generation movements for the Extremes and Accelerated cells over the last few months?[1] Here's hoping that from now on the Normal units won't grab quite so much of the limelight - even a couple of percentage points less would go a long way towards catching up everything except our ultra-laggards (which will probably need special attention!)

Cheers - Al.

[1] Witness less than 0.5% of all WUs between return from the crash and the apparent batch end a couple of days ago.
----------------------------------------
[Edit 1 times, last edit by alanb1951 at May 6, 2023 7:49:26 PM]
[May 6, 2023 7:47:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2171
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Marvellous, Mark! smile Thanks.
And Al, good to think about the distance in time between validation of (the completed tasks in) the workunit and its next release. Thanks for investigating.

I'm very glad to see that there is progress with the advancement to the next generation.
If we keep reporting Extreme workunits here, then we can track the incremental 'growth' hopefully towards a non-Extreme (e.g. Accelerated) status.

As an aside, the WCG Team doesn't have any means to favour running particular generations, I think (or seem to remember somewhere), however, if workunits keep advancing like what we've experienced just now - where the next generation is released right after completing and validating a workunit -, then this 'favouring means' wouldn't be necessary, of course. wink

Adri
PS Does anyone remember how/what we called the number between 'ARP1' and the generation number in the name of workunit? (E.g. the number 0033555 in the name of task ARP1_0033555_101_2)
[May 6, 2023 9:17:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 3312   Pages: 332   [ Previous Page | 257 258 259 260 261 262 263 264 265 266 | Next Page ]
[ Jump to Last Post ]
Post new Thread