Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 143
Posts: 143   Pages: 15   [ Previous Page | 2 3 4 5 6 7 8 9 10 11 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 163739 times and has 142 replies Next Thread
RTS48
Veteran Cruncher
Bolivia
Joined: Aug 2, 2009
Post Count: 1350
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Scheduled Maint. July 18, 14:00 UTC, extended?

Yes, we'll address the invalids once we restore operations. We'll also extend task deadlines.

Thanks everyone for your patience and support!

Juan

Juan, you should have said....

Thanks for your warm support - I shall always wear it! biggrin
----------------------------------------
Rod Peel
Santa Cruz
Bolivia
South America

,
,
[Jul 19, 2017 3:18:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
erich56
Senior Cruncher
Austria
Joined: Feb 24, 2007
Post Count: 295
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Scheduled Maint. July 18, 14:00 UTC, extended?

any rough idea when it will work again?
[Jul 19, 2017 3:38:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
jhindo
Former World Community Grid Admin
Joined: Aug 25, 2009
Post Count: 250
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Scheduled Maint. July 18, 14:00 UTC, extended?

any rough idea when it will work again?


We're making good progress, but no estimate yet. We will keep everyone posted..

Thanks,
Juan
[Jul 19, 2017 4:22:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Scheduled Maint. July 18, 14:00 UTC, extended?

So the data that you access when you upload and download files sits on a clustered file system. The maintenance window yesterday was scheduled to install the latest kernel on the servers. We completed all the servers associated with our databases, load balancing and website with no issue. We updated the first server associated with this file system with no issue.

However, after rebooting the second server, it marked its disks as 'unrecovered'. The cluster file system has a mechanism for recovering and restoring normal operations, but there was a second issue that is causing that process to run at a much slower pace. We are working on talking to 3rd layer support for the clustered file system software to find out if there is a faster way that we can run the recovery utility.

We do not expect any lose of data, but the utility is extremely careful which makes it very slow in running.


GPFS? I like GPFS smile smile biggrin


Yes - IBM Spectrum Scale FPO (i.e. shared nothing). IBM Spectrum Scale is the new marketing name for GPFS.

We really like it also. We have been talking to level 3 support and it appears that there are some configuration options set wrong that have made these events much more likely to occur. They have given us a disk check command that is running MUCH faster (to give you an idea, before we were talking to them the check was taking about 3.5 hours per 1% scanned). We restarted it based on their recommendations and it is now at 65% complete so we are hopeful to be back online in the next couple of hours.

We will also be working to fix the configuration options that should give us better stability for this cluster.

All of this means that - yes this is related to the move, but no it isn't related to the cloud.

We are on a new version and using the shared nothing options so we are encountering some new things that once we get them resolved, we expect this to become a distant memory (well, at least distant once some distance has occurred)
[Jul 19, 2017 4:35:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Scheduled Maint. July 18, 14:00 UTC, extended?

is it possible to send out a notice to boinc manager when there are issues like this?
[Jul 19, 2017 5:15:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Halo Jones
Cruncher
Joined: Mar 29, 2015
Post Count: 31
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Scheduled Maint. July 18, 14:00 UTC, extended?

is it possible to send out a notice to boinc manager when there are issues like this?

+1!
[Jul 19, 2017 5:22:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
AmigaForever
Cruncher
Germany
Joined: Aug 25, 2011
Post Count: 13
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Scheduled Maint. July 18, 14:00 UTC, extended?

is it possible to send out a notice to boinc manager when there are issues like this?

AFAIK it IS possible.....

Anyway, a big +1 from me.
[Jul 19, 2017 5:45:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Scheduled Maint. July 18, 14:00 UTC, extended?

Not only is it possible,but also a notice WAS sent to each machine which could be read in your BOINC manager notices World Community Grid: Short Planned Outage for Tuesday, July 18
[Jul 19, 2017 6:01:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Scheduled Maint. July 18, 14:00 UTC, extended?

Valid -> Invalid -> Other biggrin
Brilliant!
e.g.
SCC1_ 0000585_ Bct-E_ 17562_ 0-- Microsoft Windows 10 Core x64 Edition, (10.00.14393.00) 708 Other 7/16/17 22:22:29 7/18/17 14:21:56 0.78 18.4 / 0.0
[Jul 19, 2017 6:16:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
nivrip
Senior Cruncher
North Yorkshire
Joined: Sep 13, 2007
Post Count: 264
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Scheduled Maint. July 18, 14:00 UTC, extended?

The techies do a great job. smile

Give them time. smile
----------------------------------------
ЮРКШИР КРУНЧЕР
[Jul 19, 2017 6:23:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 143   Pages: 15   [ Previous Page | 2 3 4 5 6 7 8 9 10 11 | Next Page ]
[ Jump to Last Post ]
Post new Thread