Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 265
Posts: 265   Pages: 27   [ Previous Page | 4 5 6 7 8 9 10 11 12 13 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 638886 times and has 264 replies Next Thread
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2172
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unplanned website outage 2017-06-25.

To understand what is going on, I found this:
"You have a high chan[c]e to run into soft lockups if you have no CPU cycles available. When I/O intensive tasks run, most of your CPU cycles are blocked contending to get an ack for the write() call."
http://lists.openstack.org/pipermail/openstack/2015-January/011089.html

I'm hoping that fixing the problem would be as easy as this:
https://unix.stackexchange.com/questions/354368/nmi-watchdog-bug-soft-lockup
[Jun 26, 2017 12:34:55 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Unplanned website outage 2017-06-25.

Funny, hit on exact same thread at SE somewhat earlier... hope it is indeed a slap to the forehead when pointed.
----------------------------------------
[Edit 1 times, last edit by SekeRob* at Jun 26, 2017 12:48:29 AM]
[Jun 26, 2017 12:47:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
littlepeaks
Veteran Cruncher
USA
Joined: Apr 28, 2007
Post Count: 748
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unplanned website outage 2017-06-25.

Thanks for the timely info Kevin. I was worried that the problems were on my end. My ISP sent me an email a few days ago, telling me to reboot my gateway for a huge speed increase -- thought it might have had to do with that.
[Jun 26, 2017 2:18:25 AM]   Link   Report threatening or abusive post: please login first  Go to top 
bluestang
Senior Cruncher
USA
Joined: Oct 1, 2010
Post Count: 272
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unplanned website outage 2017-06-25.

Shout out to all men and women hard at work trying to figure this out, especially during the weekend.
----------------------------------------
[Jun 26, 2017 2:37:40 AM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1957
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unplanned website outage 2017-06-25.

But it was the storage cluster that failed.
I am aware of that. But my statement still stands...
There is no "on the fly" if there aren't any servers in the storage cluster running. If they went down hard and the filesystem wasn't unmounted cleanly, it will have to be checked and then ALL the nodes (servers in the cluster) will need to see the same consistent view of the filesystem and that is just the storage cluster backend. What the cluster presents to the other servers could be a whole different thing.
That is supposed to be the advantage of clustered storage servers and the associated distributed file systems. You will get a degraded performance while all nodes in the cluster get "back on the same page", but there is no waiting for a file system check or such as with other/older/non-distributed file system...

Ralf
----------------------------------------

[Jun 26, 2017 3:22:28 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Unplanned website outage 2017-06-25.

how was the eod updates run, if the system is down?
[Jun 26, 2017 3:28:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Unplanned website outage 2017-06-25.

Are you guys running this on ZFS, or something less reliable?
[Jun 26, 2017 3:41:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Patrickkellysyduni
Cruncher
Joined: Feb 21, 2016
Post Count: 1
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unplanned website outage 2017-06-25.

Does that mean no upload and downloading of tasks, all taks are done but are stuck:
[Jun 26, 2017 4:18:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
bfborden
Cruncher
Joined: Sep 9, 2009
Post Count: 1
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unplanned website outage 2017-06-25.

My tasks are still stuck trying to upload results. Is there an ETA?
[Jun 26, 2017 4:20:55 AM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1957
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Unplanned website outage 2017-06-25.

Are you guys running this on ZFS, or something less reliable?
I can't find the post/announcement right now, but there used to be a scheduled downtime a (couple of?) year(s) ago for the very purpose of moving the databases to ZFS...

Ralf
----------------------------------------

[Jun 26, 2017 4:24:10 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 265   Pages: 27   [ Previous Page | 4 5 6 7 8 9 10 11 12 13 | Next Page ]
[ Jump to Last Post ]
Post new Thread