Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
Member(s) browsing this thread: xensazn
Thread Status: Active
Total posts in this thread: 3318
Posts: 3318   Pages: 332   [ Previous Page | 240 241 242 243 244 245 246 247 248 249 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3310369 times and has 3317 replies Next Thread
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 982
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Adri,

Yes, I have been archiving copies of generations.txt since the ARP1 restart, and have been watching for discrepancies such as the above. Your report elsewhere provided some possible evidence for the stalled job hypothesis but, unfortunately, without work-unit numbers for other [non-generation-135? :-)] tasks whilst the throughput is low we can't sleuth this much further...

I have a spreadsheet -- much simpler than MIke's, I suspect :-) -- in which I simply watch the day-by-day movements of units in generation 121 and above; it helps with spotting these discrepancies and I've noted every one I've spotted. (The choice of 121 was based on where the boundaries were when I started monitoring)

Unfortunately, there are some holes in the data as supplied (the WCG access outage [24th..25th July 2022] and the apparent non-running of the WCG scripts [10th..19th December 2022] so who knows how many apparent stalls I might have missed... I have seen and counted nearly 100. Whether these are all accounted for by units stalling is, of course, speculation, as my methodology may be flawed.

Cheers - Al.
[Jan 9, 2023 3:04:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12436
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

My spreadsheet started on 18 July 2022 and I have every Sunday thereafter. It is the basis of my Sunday Report. The current week gets updated daily until Sunday and then I start a new set of columns. The data records the number of units in each generation.

It includes 11 & 18 December with the previous data and generations 0 to 182 for completeness. It calculates the outstanding units and forecasts the end date.

The extreme & accelerated generations are highlighted in colours.

I very much tend to agree with the hypothesis that the discrepancy of 2 that I reported was probably due to units being withdrawn for having too many actual errors (No Replies not counting).

Mike
[Jan 9, 2023 9:12:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2171
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Maybe there's someone who remembers alanb1951's post 681268 in which Al referenced my report on workunit ARP1_0031151_128 (see post 681205), that segfaulted on 07-01-2023 …

Well, I have received the same workunit again (with a different ID, of course) on Friday the 13th! laughing

As Cyclops explained in post 681390, they seem to have adjusted the granularity/time-step before re-sending them to resolve the problem.

The result is that this workunit made it all the way (100% smile) now:

workunit 245445898
App: Africa Rainfall Project
Workunit: ARP1_0031151_128
Created: 2023-01-13T08:22:06
Quorum: 2
Replication: 2

ARP1_0031151_128_0 Linux Ubuntu Valid 2023-01-13T08:33:58 2023-01-14T06:48:34 22.01/22.22 787.0/761.3
ARP1_0031151_128_1 Fedora Linux Valid 2023-01-13T08:34:00 2023-01-14T07:55:28 23.05/23.30 735.6/761.3
dancing
[Jan 14, 2023 8:38:09 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 982
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Adri, it's useful that you got a task from the re-submitted work unit as it means we know that progress has been made, rather than just seeing it mentioned in a WCG-authored forum post :-) -- I've added a "fixed now" annotation to my database of known stalled/freed units.

As an aside, I wonder how many of those occasional singleton work units that seem to be sent out are manual re-submissions of units that failed either because of real errors (as in this case) or because it eventually got fed up with the number of retries...

Cheers - Al.

P.S. I'm jealous - you seem to get the occasional ARP1 WU even though the supply is limited, and I get none :-(
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Jan 14, 2023 9:38:57 AM]
[Jan 14, 2023 9:38:04 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12436
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Just received one!

ARP1_0018148_135_0 Microsoft Windows 10 Professional x64 Edition, (10.00.19044.00) In Progress 2023-01-14 07:34:11 UTC 2023-01-20 07:34:11 UTC
ARP1_0018148_135_1 Microsoft Windows 7 Professional x64 Edition, Service Pack 1, (06.01.7601.00) In Progress 2023-01-14 07:34:11 UTC 2023-01-20 07:34:11 UTC

We now need them to unstick generations 14, 16 & 17.

It would seem that re-sends because of 'No Reply' don't count - only 'Error' when judging if a unit is stuck.

Adjusting the timestep from 36 seconds to 24 seconds worked for IBM at the beginning of last year. They then reverted to 36 seconds after about 3 generations. With a timestep of 36 seconds, the progress updates in steps of 0.02083333% whereas with 24 seconds it is 0.01388888%.

Mike
----------------------------------------
[Edit 2 times, last edit by Mike.Gibson at Jan 14, 2023 10:02:14 AM]
[Jan 14, 2023 9:53:06 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12436
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

The number of units registered as being in the extreme an accelerated categories has risen indicating that some stuck units have moved on with the help of the timestep adjustment.

Mike
[Jan 14, 2023 12:38:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12436
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

A few days ago, the 2 units in generation 98 completed without moving on a generation. The same has happened today with 1 of them. I presume that it errored out.

Mike
[Jan 15, 2023 6:30:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12436
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Sunday Report

Only 202 units validated in a week so an average of 28.9 per day of which 2 were extremes & 31 were accelerated based on their generations.

Assuming that a full generation 182 will be the last, there are 1,694,956 units still outstanding, so my forecast end date would now be 25 May 2027, however we are still coming out of testing so we should finish well before then.

The definitions of normal, accelerated & extreme have remained generations 142, 132 & 127, respectively.

There are now 33 Extremes and 60 Accelerated units listed as there has been some re-starting, although the numbers in their generations are 1,461 & 4,373 due to lack of movement/change of definition..

Mike
[Jan 15, 2023 6:48:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12436
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Another one:

ARP1_0033327_135_0 Microsoft Windows 10 Education x64 Edition, (10.00.19044.00) In Progress 2023-01-17 07:26:24 UTC 2023-01-23 07:26:24 UTC
ARP1_0033327_135_1 Microsoft Windows 7 Professional x64 Edition, Service Pack 1, (06.01.7601.00) In Progress 2023-01-17 07:26:24 UTC 2023-01-23 07:26:24 UTC

Mike
[Jan 17, 2023 4:05:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Taurus Oldbull
Advanced Cruncher
US
Joined: Nov 26, 2020
Post Count: 53
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

First one in more than a week:

ARP1_0024705_135 Darwin 17.7.0 In Progress 2023-01-17 00:34:47 UTC 2023-01-23 00:34:47 UTC
[Jan 18, 2023 6:55:32 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 3318   Pages: 332   [ Previous Page | 240 241 242 243 244 245 246 247 248 249 | Next Page ]
[ Jump to Last Post ]
Post new Thread