Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 143
|
![]() |
Author |
|
RTS48
Veteran Cruncher Bolivia Joined: Aug 2, 2009 Post Count: 1350 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Yes, we'll address the invalids once we restore operations. We'll also extend task deadlines. Thanks everyone for your patience and support! Juan Juan, you should have said.... Thanks for your warm support - I shall always wear it! ![]()
Rod Peel
Santa Cruz Bolivia South America ![]() ![]() |
||
|
erich56
Senior Cruncher Austria Joined: Feb 24, 2007 Post Count: 295 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
any rough idea when it will work again?
|
||
|
jhindo
Former World Community Grid Admin Joined: Aug 25, 2009 Post Count: 250 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
any rough idea when it will work again? We're making good progress, but no estimate yet. We will keep everyone posted.. Thanks, Juan |
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
So the data that you access when you upload and download files sits on a clustered file system. The maintenance window yesterday was scheduled to install the latest kernel on the servers. We completed all the servers associated with our databases, load balancing and website with no issue. We updated the first server associated with this file system with no issue. However, after rebooting the second server, it marked its disks as 'unrecovered'. The cluster file system has a mechanism for recovering and restoring normal operations, but there was a second issue that is causing that process to run at a much slower pace. We are working on talking to 3rd layer support for the clustered file system software to find out if there is a faster way that we can run the recovery utility. We do not expect any lose of data, but the utility is extremely careful which makes it very slow in running. GPFS? I like GPFS ![]() ![]() ![]() Yes - IBM Spectrum Scale FPO (i.e. shared nothing). IBM Spectrum Scale is the new marketing name for GPFS. We really like it also. We have been talking to level 3 support and it appears that there are some configuration options set wrong that have made these events much more likely to occur. They have given us a disk check command that is running MUCH faster (to give you an idea, before we were talking to them the check was taking about 3.5 hours per 1% scanned). We restarted it based on their recommendations and it is now at 65% complete so we are hopeful to be back online in the next couple of hours. We will also be working to fix the configuration options that should give us better stability for this cluster. All of this means that - yes this is related to the move, but no it isn't related to the cloud. We are on a new version and using the shared nothing options so we are encountering some new things that once we get them resolved, we expect this to become a distant memory (well, at least distant once some distance has occurred) |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
is it possible to send out a notice to boinc manager when there are issues like this?
|
||
|
Halo Jones
Cruncher Joined: Mar 29, 2015 Post Count: 31 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
is it possible to send out a notice to boinc manager when there are issues like this? +1! |
||
|
AmigaForever
Cruncher Germany Joined: Aug 25, 2011 Post Count: 13 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
is it possible to send out a notice to boinc manager when there are issues like this? AFAIK it IS possible..... Anyway, a big +1 from me. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Not only is it possible,but also a notice WAS sent to each machine which could be read in your BOINC manager notices World Community Grid: Short Planned Outage for Tuesday, July 18
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Valid -> Invalid -> Other
![]() Brilliant! e.g. SCC1_ 0000585_ Bct-E_ 17562_ 0-- Microsoft Windows 10 Core x64 Edition, (10.00.14393.00) 708 Other 7/16/17 22:22:29 7/18/17 14:21:56 0.78 18.4 / 0.0 |
||
|
nivrip
Senior Cruncher North Yorkshire Joined: Sep 13, 2007 Post Count: 264 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The techies do a great job.
----------------------------------------![]() Give them time. ![]()
ЮРКШИР КРУНЧЕР
|
||
|
|
![]() |