Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 143
|
![]() |
Author |
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
So the data that you access when you upload and download files sits on a clustered file system. The maintenance window yesterday was scheduled to install the latest kernel on the servers. We completed all the servers associated with our databases, load balancing and website with no issue. We updated the first server associated with this file system with no issue.
----------------------------------------However, after rebooting the second server, it marked its disks as 'unrecovered'. The cluster file system has a mechanism for recovering and restoring normal operations, but there was a second issue that is causing that process to run at a much slower pace. We are working on talking to 3rd layer support for the clustered file system software to find out if there is a faster way that we can run the recovery utility. We do not expect any lose of data, but the utility is extremely careful which makes it very slow in running. [Edit 1 times, last edit by knreed at Jul 19, 2017 11:26:08 AM] |
||
|
duanebong
Advanced Cruncher Singapore Joined: Apr 25, 2009 Post Count: 134 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
"and there is no setting to control it in BOINC's preferences". There is, I think if you tick the "Show advanced..." box, you get to set the space or percent. You can set the max used storage space but it doesn't download extra wu's. If that's what this setting is for it doesn't work. Same for me. In advanced setting we can control how much is the min remaining disk space BOINC will ensure the phone has, and also the max amount of space BOINC is allowed to consume. But there is no way to control how many days of WU Android phones keep in their buffer. Generally the phone crunches on 1 WU and then downloads 1 or 2 extra WU in reserve. It adjusts for the number of cores you've allowed BOINC to use. So if you have an 8 core phone and allow 2 cores to be used for crunching, the phone will have a total of 4-6 WUs downloaded (working on 2 WUs + another 2-4 in reserve). It would be useful to be able to control it more. Not just to cover server outages, but some times you could be on the road with no WIFI access. It would be nice to pre-load the buffer before leaving the house. ![]() |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
So the data that you access when you upload and download files sits on a clustered file system. The maintenance window yesterday was scheduled to install the latest kernel on the servers. We completed all the servers associated with our databases, load balancing and website with no issue. We updated the first server associated with this file system with no issue. However, after rebooting the second server, it marked its disks as 'unrecovered'. The cluster file system has a mechanism for recovering and restoring normal operations, but there was a second issue that is causing that process to run at a much slower pace. We are working on talking to 3rd layer support for the clustered file system software to find out if there is a faster way that we can run the recovery utility. We do not expect any lose of data, but the utility is extremely careful which makes it very slow in running. Was using KSplice for long until Oracle proprietized it, but Ubuntu now too has boot-less kernel updating. [Edit 1 times, last edit by SekeRob* at Jul 19, 2017 11:43:15 AM] |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
"and there is no setting to control it in BOINC's preferences". There is, I think if you tick the "Show advanced..." box, you get to set the space or percent. You can set the max used storage space but it doesn't download extra wu's. If that's what this setting is for it doesn't work. My unit had one active wu (only using one core) and it has been idle since it completed. Max storage space is set to 90% and there's plenty of free space but it isn't being used for more wu's. I wonder if this is a bug? The space in which BOINC is installed by default is limited. Think they're working towards a new release removing some overdone controls. /OT |
||
|
mmonnin
Advanced Cruncher Joined: Jul 20, 2016 Post Count: 148 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Also, why do maintenance in the MIDDLE of the week? Unless it was absolute emergency, maintenance should wait until WEEKENDs. Deferring planned system maintenance work to the weekend makes pirfect sense for an enterprise that runs at maximum only five days a week with reduced usage during the weekend. Doing so minimized the impact by affecting only the small number of weekend workers. With an operation that runs 24/7 with users around the globe, it makes no sense to defer planned work to any specific day. When the work comes up on the schedule and the manpower to do it is available, it makes sense to do it during the staff's regular work day because there is no time of reduced use. This project may run 24/7 but that doesn't admins work 24/7. The people actually doing the upgrade I'm guessing do not have 24/7 coverage of a full staff. Tuesday is the typical upgrade day of the week. As an example, Microsoft releases patches on Tuesdays. Monday is a day to clean up anything from the weekend when most do not work. Patch on Tuesday so you have 3-4 days to fix/roll back/verify patch before the next weekend. ![]() |
||
|
Dark Angel
Veteran Cruncher Australia Joined: Nov 11, 2005 Post Count: 721 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
aaaand it's still broke
----------------------------------------![]() Currently being moderated under false pretences |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Ouch, all my Valid workunits uploaded at or after 14:21:56 UTC 18 July have turned Invalid. More tidying up to do, techs!
|
||
|
gb009761
Master Cruncher Scotland Joined: Apr 6, 2005 Post Count: 2982 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Ouch, all my Valid workunits uploaded at or after 14:21:56 UTC 18 July have turned Invalid. More tidying up to do, techs! Yes, same here - so much for the statement that there'll be no loss of data!!!Bang goes my machines reliability status😫 ![]() |
||
|
Sandvika
Advanced Cruncher United Kingdom Joined: Apr 27, 2007 Post Count: 112 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I don't like to be profane in a forum
----------------------------------------![]() ![]() ![]() ![]() As an IT tech with 30+ years behind me I've been wary of the dash to cloud technology. It seems there's a lot to be said for spreading risk of unavailability across multiple vendors let alone multiple data centres, and if something is absolutely mission critical, as in this case, then having it in a hybrid environment such that overall control and therefore availabillity is ensured surely makes sense. I appreciate that cluster replication is limited to wire speeds unless in a single virtualised host, but there's no noticable loss of performance when my ZFS is resilvering after replacing a failed drive and I'd expect the same of a well implemented clustered filesystem. In this scenario I'd not expect failure of a single server to degrade performance, let alone to be fatal. I'd expect it to be more like an air crash investigation - very seldom is a whole fleet grounded after a single catastophe and the investigative emphasis is on preventing a recurrence and developing contingencies, not a dash to get airborne again. Meanwhile I have discovered how to recover my Rosetta@home profile from the BOINC client config files so I no longer have 40 idle cores! ![]() ![]() |
||
|
kkwok
Cruncher Joined: Nov 23, 2004 Post Count: 5 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Care to share how to add another project under BOINC client?
|
||
|
|
![]() |