Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 265
Posts: 265   Pages: 27   [ Previous Page | 5 6 7 8 9 10 11 12 13 14 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 72814 times and has 264 replies Next Thread
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1842
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unplanned website outage 2017-06-25.

My tasks are still stuck trying to upload results. Is there an ETA?
None has been posted so far, as it seems it is a deeper laying issue with the storage system.
Just be patient, you are not the only one effected, but pretty much half a million folks participating in WCG...

Ralf
----------------------------------------

[Jun 26, 2017 4:28:25 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Traveller42
Cruncher
Joined: May 7, 2017
Post Count: 21
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unplanned website outage 2017-06-25.

They are far beyond "Have you tried turning it off and then on again".

Uploading, Task Scheduling, and Downloading are unavailable until they resolve the issue.

As one who supports large distributed systems, even the most reliable system with the most redundancy can fail. Almost by definition, it will be something they have never seen before.

While a head-slapper is possible, they are actually rare. It is only the sheer number of systems out there that provide the stories we all read about.

I will thank them for their efforts and for the updates that have provided, and in advance for the updates to come.

[Edit to correct typo.]
----------------------------------------
[Edit 1 times, last edit by Traveller42 at Jun 26, 2017 4:27:54 PM]
[Jun 26, 2017 4:39:19 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Wisesooth
Cruncher
United States
Joined: Aug 5, 2016
Post Count: 9
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unplanned website outage 2017-06-25.

RPI's project milkyway@home had a similar problem. They were getting too much traffic. Although their receiver could keep up, their database server got hammered to its knees. Ditto for seti@home. When IBM turns its receiver on, the backlog of uploading tasks will overwhelm your database server unless you take precautions. It took seti over a week to rebuild their database. Hope this helps you avoid that possibility. RPI's scientists healed their problem using a technique they called "bundling."
[Jun 26, 2017 5:04:00 AM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unplanned website outage 2017-06-25.

I have extended the deadlines for all results on the database that are in progress. Your clients will not see this change, but the backend jobs will read this update when we get those processes back online.

Thanks,
-Uplinger
[Jun 26, 2017 5:16:20 AM]   Link   Report threatening or abusive post: please login first  Go to top 
mano_mk
Cruncher
Joined: Aug 13, 2010
Post Count: 2
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unplanned website outage 2017-06-25.

Hi there,

Just checked in to see how things stand and I see you guys are still hard at work.

Like others here I'm also working in IT and while I never had a system of that magnitude to deal with, I've encountered enough annoyingly complex failures which just happened. So, no pressure from me, I hope for your sake you'll fix it soon and thank you for your work!
----------------------------------------
[Edit 1 times, last edit by mano_mk at Jun 26, 2017 6:21:54 AM]
[Jun 26, 2017 5:28:23 AM]   Link   Report threatening or abusive post: please login first  Go to top 
CandymanWCG
Senior Cruncher
Romania
Joined: Dec 20, 2010
Post Count: 421
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unplanned website outage 2017-06-25.

Thanks for the update, Uplinger. Hope you guys get a handle on things soon. My one and only machine has run itself dry since midnight. sad That makes me reconsider the size of the cached work I keep from now on.

Cheers!

Edit1: Oh, right, the systems are down so cannot save the changes to my profile. Should've seen that coming. blushing
----------------------------------------
Knowledge is limited. Imagination encircles the world! - Albert Einstein



----------------------------------------
[Edit 1 times, last edit by CandymanWCG at Jun 26, 2017 7:06:28 AM]
[Jun 26, 2017 7:02:42 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Warped@RSA
Senior Cruncher
South Africa
Joined: Jan 15, 2006
Post Count: 418
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unplanned website outage 2017-06-25.

I have extended the deadlines for all results on the database that are in progress.
Thanks,
-Uplinger

Thanks Keith.
Every Cloud has a Silver Lining. biggrin
----------------------------------------
Dave


[Jun 26, 2017 7:25:10 AM]   Link   Report threatening or abusive post: please login first  Go to top 
robertmiles
Senior Cruncher
US
Joined: Apr 16, 2008
Post Count: 442
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unplanned website outage 2017-06-25.

One thing for sure it is not a fake problem wink

I will speculate (since it is free) that it is not a problem they have seen before, or they would have provided some redundancy. So it is a deep underlying incompatibility that will take a while to fix.

Unless someone tripped over the power cord.


I'd expect them to use computers big enough that the power cords would use the explosion-proof connectors like those I saw on a computer power cord years ago - those WON'T pull out of the socket unless you first spend a minute or two unscrewing the outer shell.

An idea for how to avoid resends that aren't needed - disable the program that decides what to resend for about 24 hours after uploads are working again.
[Jun 26, 2017 7:33:44 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Unplanned website outage 2017-06-25.

Are you guys running this on ZFS, or something less reliable?
I can't find the post/announcement right now, but there used to be a scheduled downtime a (couple of?) year(s) ago for the very purpose of moving the databases to ZFS...

Ralf

Interesting. At least we can rule out a filesystem issue, assuming they were using ZFS properly.
----------------------------------------
[Edit 2 times, last edit by res1233 at Jun 26, 2017 8:18:18 AM]
[Jun 26, 2017 8:17:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
BQL_FFM
Cruncher
Germany
Joined: Jun 16, 2016
Post Count: 15
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unplanned website outage 2017-06-25.

I have extended the deadlines for all results on the database that are in progress. Your clients will not see this change, but the backend jobs will read this update when we get those processes back online.

Thanks,
-Uplinger

That's great, thanks!
[Jun 26, 2017 8:30:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 265   Pages: 27   [ Previous Page | 5 6 7 8 9 10 11 12 13 14 | Next Page ]
[ Jump to Last Post ]
Post new Thread