Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 61
|
![]() |
Author |
|
[CSF] Thomas Dupont
Veteran Cruncher Joined: Aug 25, 2013 Post Count: 685 Status: Offline |
Greetings, We are extremely sorry about the work unit outage that occurred this weekend. My initial investigation shows that a the scripts that push the workunits into the database was stuck with an illegal lock file. It appears that the lock file was illegal because both servers that attempt to load work happened to create it within the same time. This has been the same mechanism we have used for many years without issue. I will be adding additional code the the create work scripts to check for an extended period of time and send alerts to our team if the lock file is illegal. I will also see if I can add additional monitoring from an external server to check to make sure there is work available on the servers. At the moment, I need to think what the best possible way of doing this is. On a personal side, usually I log in and check the grid health every day. Starting thursday of last week, I was spending some time with my twin brother for a long weekend. Since I did not get any alerts due to no checks for the feeder being completely empty as mentioned before, I assumed all was well. Again, we will be adding some more monitoring so this issue can be caught earlier and work can be consistently flowing. Thank you for your patience with us on this. Thanks, -Uplinger Thanks Keith for this detailed report! |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Unfortunately there were/are no news, statements or reportings on https://secure.worldcommunitygrid.org/about_us/displayNews.do and Twitter. mibere, My main goal is getting things back up and running. My secondary goal is updating the members and the forums are a quick way for me to do this. News/Twitter require a little more time to post as they require input from other WCG team members. We are working on that communication now. Thanks, -Uplinger |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I just got a MCM_2.
Thanks for jumping headlong into work. That's a terrible way to start a Monday morning. <wink> Good job guys. <fist bump> |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I just got a MCM_2. Thanks for jumping headlong into work. That's a terrible way to start a Monday morning. <wink> Good job guys. <fist bump> Thanks! And, it is definitely not how I was hoping my Monday would go. Or any day of the week :) but fires do happen, I am sorry I did not catch is sooner. Thanks, -Uplinger |
||
|
[CSF] Thomas Dupont
Veteran Cruncher Joined: Aug 25, 2013 Post Count: 685 Status: Offline |
Unfortunately there were/are no news, statements or reportings on https://secure.worldcommunitygrid.org/about_us/displayNews.do and Twitter. mibere, My main goal is getting things back up and running. My secondary goal is updating the members and the forums are a quick way for me to do this. News/Twitter require a little more time to post as they require input from other WCG team members. We are working on that communication now. Thanks, -Uplinger The WCG team communicates very effectively with volunteers on Twitter. Always ![]() And they are very reactive. A proof ? Look at that >> https://twitter.com/TEAM_CSF/status/595107923161919488 |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thomas,
You have Erika to thank for the Twitter communications. She is the primary on the social media communications. Thanks, -Uplinger |
||
|
[CSF] Thomas Dupont
Veteran Cruncher Joined: Aug 25, 2013 Post Count: 685 Status: Offline |
Thomas, You have Erika to thank for the Twitter communications. She is the primary on the social media communications. Thanks, -Uplinger Thhhhhaaaannnnnkkksssss my dear Erika ! Your job on Twitter is awesome :) Thanks Keith for pointing this out ![]() Tom |
||
|
ErikaT
Former World Community Grid Admin USA Joined: Apr 27, 2009 Post Count: 912 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thomas, You have Erika to thank for the Twitter communications. She is the primary on the social media communications. Thanks, -Uplinger Thhhhhaaaannnnnkkksssss my dear Erika ! Your job on Twitter is awesome :) Thanks Keith for pointing this out ![]() Tom ![]() ErikaT |
||
|
yoro42
Ace Cruncher United States Joined: Feb 19, 2011 Post Count: 8979 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
On a personal side, usually I log in and check the grid health every day. Starting thursday of last week, I was spending some time with my twin brother for a long weekend. Since I did not get any alerts due to no checks for the feeder being completely empty as mentioned before, I assumed all was well. Uplinger, How systems know that we stop monitoring them we will never know. Given time, code will find a path not accounted for. Fortunately the result was an inconvenience and not a catastrophe. I trust you had a great weekend with your brother, Rory ![]() |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
On a personal side, usually I log in and check the grid health every day. Starting thursday of last week, I was spending some time with my twin brother for a long weekend. Since I did not get any alerts due to no checks for the feeder being completely empty as mentioned before, I assumed all was well. Uplinger, How systems know that we stop monitoring them we will never know. Given time, code will find a path not accounted for. Fortunately the result was an inconvenience and not a catastrophe. I trust you had a great weekend with your brother, Rory Rory, I did have a great weekend with my brother, probably another reason why I stayed disconnected :P As for the monitoring, the fun part on that is even though you may have something in place, the only way to test it is for there to be an actual issue/error. Which in a production system isn't something you want to test live :) I believe I have a plan going forward to help us with our monitoring and how to properly test it with little to no interruption to the member services. Thanks again, -Uplinger |
||
|
|
![]() |