Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 51
Posts: 51   Pages: 6   [ 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 200623 times and has 50 replies Next Thread
Cyclops
Senior Cruncher
Joined: Jun 13, 2022
Post Count: 295
Status: Offline
Reply to this Post  Reply with Quote 
2023-04-06 Update (WU Distribution Update)

WU Distribution Update

We are working towards resuming a consistent WU supply similar to what we had before the storage system failure. The recent sparsity of OPN1 WU was caused by a batch that has blocked the create-work process for all other projects. We have found and fixed the glitch, and the system is busy creating work for OPN1 right now. We still have an ARP1 backlog of unsent results (see ARP project update ), but we now have a spare capacity for a larger backlog. After OPN1 work units are prepared, the system will prepare ARP1 work units.

On the back end, we still had to finalize setup of the new storage as there was a networking issue that was preventing us from accessing the tape archive. Data center admins have helped to fix it, and the production system on the new storage is being backed up.

We continue to investigate the errors in the BOINC system services, specifically assimilators and validators. Unfortunately, the application is written such that an unexpected error halts the service (which happened when our storage system failed). We are attempting to clear out the problematic data to allow the applications to continue processing other results, but BOINC doesn't seem to have an easy method of flushing specific workunits or results out of its system.

If you have any comments or questions, please leave them in this thread for us to answer. Thank you for your support, patience and understanding.

WCG team
[Apr 6, 2023 7:44:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Speedy51
Veteran Cruncher
New Zealand
Joined: Nov 4, 2005
Post Count: 1220
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2023-04-06 Update (WU Distribution Update)

Thank you for the update. I have 6 pages of Opn1 (CPU tasks) pending verification, returned on 4/3 or 4/4. Hopefully these will clear in the coming day or 2

Here are a few of the task names that I have "pending verification"


OPN1_0128917_01594_0
OPN1_0128917_01584_0
OPN1_0128917_01589_0

Update work has started to be Verified. Keep up the great work is much appreciated
----------------------------------------

----------------------------------------
[Edit 3 times, last edit by Speedy51 at Apr 7, 2023 9:26:44 AM]
[Apr 6, 2023 11:17:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 11791
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2023-04-06 Update (WU Distribution Update)

I seem to have a problem with priority units. They are showing a correct deadline of +3 days but are not being crunched earlier than as if they were on +6 days. This has been occurring on MCM1 & OPN1. I am connected to 7.20.2.

Mike
[Apr 7, 2023 1:32:19 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 11791
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2023-04-06 Update (WU Distribution Update)

I have my cache set to 5+1 days. With an 8-thread machine that should result in at least 40 CPU days work.

However, I only have 12 days of ARP, 7 hours OPN and 3 hours MCM but my event log says not requesting tasks:don't need!

I have app_config set to a maximum of 4 ARP, 2 OPN & 2 MCM. That means 3 days work for my ARP threads, only 2 OPN units for another 2.5 hours and only 1 MCM unit for another 80 minutes.

I am crunching version 7.20.2 with Windows 7.

Mike
[Apr 7, 2023 2:51:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1294
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2023-04-06 Update (WU Distribution Update)

I have my cache set to 5+1 days. With an 8-thread machine that should result in at least 40 CPU days work.
This is only true, if that machine was running 24/7 for a long period, before asking more work.

However, I only have 12 days of ARP, 7 hours OPN and 3 hours MCM but my event log says not requesting tasks:don't need!
You could check in client_state.xml, whether in time_stats the on_frac value is near 1 or much lower.
----------------------------------------

[Apr 7, 2023 4:03:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 11791
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2023-04-06 Update (WU Distribution Update)

I have my cache set to 5+1 days. With an 8-thread machine that should result in at least 40 CPU days work.
This is only true, if that machine was running 24/7 for a long period, before asking more work.

However, I only have 12 days of ARP, 7 hours OPN and 3 hours MCM but my event log says not requesting tasks:don't need!
You could check in client_state.xml, whether in time_stats the on_frac value is near 1 or much lower.


<on_frac>0.973955</on_frac>
<connected_frac>0.999969</connected_frac>
<cpu_and_network_available_frac>0.999966

Thanks

Mike
[Apr 7, 2023 4:45:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 1866
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2023-04-06 Update (WU Distribution Update)

Our old "foe", "transient HTTP error", is back with a vengeance. I think they started sending out OPNG tasks, at the same time as ARP1 tasks are being sent out. That hasn't worked before, and it sure doesn't work now.
----------------------------------------

----------------------------------------
[Edit 2 times, last edit by Grumpy Swede at Apr 8, 2023 9:06:14 AM]
[Apr 8, 2023 9:04:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 1978
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2023-04-06 Update (WU Distribution Update)

Well, Grumpy Swede, although I'm seeing "transient HTTP error", too, the problem is not downloading tasks, but it is uploading them instead. Will post back when the problem is resolved for me or when my upload queue is empty again.

Anyway, the problems are transient, so there's hope. smile

UPDATE:
The problem started for me at 08:14 UTC today:
08-Apr-2023 10:14:16 [World Community Grid] Temporarily failed upload of OPN1_0129309_01811_0_r2116729635_0: transient HTTP error

Sometimes an upload succeeds.

The upload speed of ARP1-files is nearly abysmally slooooow:
$ wcgresults -X
Up/Down Speed Sticky Active Elapsed Xferred Filename
up 0 B/s no no 0:00:21 130724 ARP1_0003625_133_1_r415689557_1
up 0 B/s no no 0:00:24 107 ARP1_0003625_133_1_r415689557_4
up 0 B/s no no 0:00:13 107 ARP1_0004384_134_0_r1959148253_1
up 0 B/s no no 0:00:15 107 ARP1_0004384_134_0_r1959148253_4
up 0 B/s no no 0:00:04 107 ARP1_0004384_134_0_r1959148253_5
up 0 B/s no no 0:00:05 239 ARP1_0004384_134_0_r1959148253_6
up 14 kB/s no yes 0:15:28 13041011 ARP1_0013565_134_0_r1936221568_1
up 14 kB/s no yes 0:02:32 2031131 ARP1_0013565_134_0_r1936221568_2
up 0 B/s no no 0:00:00 0 ARP1_0013565_134_0_r1936221568_3
up 0 B/s no no 0:00:00 0 ARP1_0013565_134_0_r1936221568_4
up 0 B/s no no 0:00:00 0 ARP1_0013565_134_0_r1936221568_5
up 0 B/s no no 0:00:00 0 ARP1_0013565_134_0_r1936221568_6
up 0 B/s no no 0:00:24 8990 MCM1_0197383_4956_0_r1912020586_0
up 0 B/s no no 0:00:08 27272 OPN1_0129258_00073_0_r1447773770_0
up 0 B/s no no 0:00:00 0 OPN1_0129258_00077_0_r758778455_0
up 0 B/s no no 0:00:00 0 OPN1_0129258_00087_0_r713744843_0
up 0 B/s no no 0:00:34 26848 OPN1_0129258_00104_0_r466033273_0
up 0 B/s no no 0:00:08 27008 OPN1_0129305_00969_0_r1403190389_0
up 0 B/s no no 0:00:10 107 OPN1_0129305_01013_0_r1962884788_0
up 0 B/s no no 0:00:05 107 OPN1_0129309_02241_0_r1350402589_0
up 0 B/s no no 0:00:00 0 OPN1_0129339_01892_0_r2032851162_0
up 0 B/s no no 0:00:00 0 OPN1_0129339_01901_0_r1248929576_0
up 0 B/s no no 0:00:00 0 OPN1_0129339_01937_0_r1682900680_0
up 0 B/s no no 0:00:05 107 OPN1_0129339_01944_0_r176053620_0
up 0 B/s no no 0:00:24 107 OPNG_0172814_00033_1_r1183631947_0
up 0 B/s no no 0:00:28 107 OPNG_0172814_00033_1_r1183631947_1
up 0 B/s no no 0:00:22 107 OPNG_0172814_00037_1_r684796484_0
up 0 B/s no no 0:00:00 0 OPNG_0172814_00060_1_r302969735_0
up 0 B/s no no 0:00:00 0 OPNG_0172814_00060_1_r302969735_1
up 0 B/s no no 0:00:04 107 OPNG_0172818_00020_0_r1169934065_0
up 0 B/s no no 0:00:00 0 OPNG_0172818_00030_0_r943811290_0
up 0 B/s no no 0:00:00 0 OPNG_0172818_00030_0_r943811290_1
up 0 B/s no no 0:00:19 107 OPNG_0172818_00033_0_r1622361912_0
up 0 B/s no no 0:00:06 130723 OPNG_0172818_00045_0_r236970297_1
up 0 B/s no no 0:00:00 0 OPNG_0172818_00048_0_r16789234_0
up 0 B/s no no 0:00:00 0 OPNG_0172818_00048_0_r16789234_1


Adri
----------------------------------------
[Edit 2 times, last edit by adriverhoef at Apr 8, 2023 10:50:20 AM]
[Apr 8, 2023 10:32:56 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Dayle Diamond
Senior Cruncher
Joined: Jan 31, 2013
Post Count: 440
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2023-04-06 Update (WU Distribution Update)

It's not just you, GS, I'm getting it when I try to upload OPN work.
[Apr 8, 2023 10:37:03 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 1866
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2023-04-06 Update (WU Distribution Update)

@adriverhoef

Yes, it's uploading I was talking about. I failed to say that. Downloading is pretty OK here. The last time we had the same transient crap, it was for both downloading and uploading.
----------------------------------------

[Apr 8, 2023 10:58:55 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 51   Pages: 6   [ 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread