Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 61
Posts: 61   Pages: 7   [ Previous Page | 1 2 3 4 5 6 7 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 65301 times and has 60 replies Next Thread
[CSF] Thomas Dupont
Veteran Cruncher
Joined: Aug 25, 2013
Post Count: 685
Status: Offline
Reply to this Post  Reply with Quote 
Re: No Work Available for ALL Projects

Greetings,

We are extremely sorry about the work unit outage that occurred this weekend. My initial investigation shows that a the scripts that push the workunits into the database was stuck with an illegal lock file. It appears that the lock file was illegal because both servers that attempt to load work happened to create it within the same time. This has been the same mechanism we have used for many years without issue.

I will be adding additional code the the create work scripts to check for an extended period of time and send alerts to our team if the lock file is illegal. I will also see if I can add additional monitoring from an external server to check to make sure there is work available on the servers. At the moment, I need to think what the best possible way of doing this is.

On a personal side, usually I log in and check the grid health every day. Starting thursday of last week, I was spending some time with my twin brother for a long weekend. Since I did not get any alerts due to no checks for the feeder being completely empty as mentioned before, I assumed all was well.

Again, we will be adding some more monitoring so this issue can be caught earlier and work can be consistently flowing. Thank you for your patience with us on this.

Thanks,
-Uplinger


Thanks Keith for this detailed report!
----------------------------------------
[May 4, 2015 2:34:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No Work Available for ALL Projects

Unfortunately there were/are no news, statements or reportings on https://secure.worldcommunitygrid.org/about_us/displayNews.do and Twitter.


mibere,

My main goal is getting things back up and running. My secondary goal is updating the members and the forums are a quick way for me to do this. News/Twitter require a little more time to post as they require input from other WCG team members. We are working on that communication now.

Thanks,
-Uplinger
[May 4, 2015 2:40:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: No Work Available for ALL Projects

I just got a MCM_2.

Thanks for jumping headlong into work. That's a terrible way to start a Monday morning. <wink>

Good job guys. <fist bump>
[May 4, 2015 2:49:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No Work Available for ALL Projects

I just got a MCM_2.

Thanks for jumping headlong into work. That's a terrible way to start a Monday morning. <wink>

Good job guys. <fist bump>


Thanks! And, it is definitely not how I was hoping my Monday would go. Or any day of the week :) but fires do happen, I am sorry I did not catch is sooner.

Thanks,
-Uplinger
[May 4, 2015 3:04:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[CSF] Thomas Dupont
Veteran Cruncher
Joined: Aug 25, 2013
Post Count: 685
Status: Offline
Reply to this Post  Reply with Quote 
Re: No Work Available for ALL Projects

Unfortunately there were/are no news, statements or reportings on https://secure.worldcommunitygrid.org/about_us/displayNews.do and Twitter.


mibere,

My main goal is getting things back up and running. My secondary goal is updating the members and the forums are a quick way for me to do this. News/Twitter require a little more time to post as they require input from other WCG team members. We are working on that communication now.

Thanks,
-Uplinger

The WCG team communicates very effectively with volunteers on Twitter.
Always biggrin
And they are very reactive.
A proof ? Look at that >> https://twitter.com/TEAM_CSF/status/595107923161919488
----------------------------------------
[May 4, 2015 3:15:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No Work Available for ALL Projects

Thomas,

You have Erika to thank for the Twitter communications. She is the primary on the social media communications.

Thanks,
-Uplinger
[May 4, 2015 3:29:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[CSF] Thomas Dupont
Veteran Cruncher
Joined: Aug 25, 2013
Post Count: 685
Status: Offline
Reply to this Post  Reply with Quote 
Re: No Work Available for ALL Projects

Thomas,

You have Erika to thank for the Twitter communications. She is the primary on the social media communications.

Thanks,
-Uplinger

Thhhhhaaaannnnnkkksssss my dear Erika !
Your job on Twitter is awesome :)
Thanks Keith for pointing this out cool
Tom
----------------------------------------
[May 4, 2015 3:40:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
ErikaT
Former World Community Grid Admin
USA
Joined: Apr 27, 2009
Post Count: 912
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No Work Available for ALL Projects

Thomas,

You have Erika to thank for the Twitter communications. She is the primary on the social media communications.

Thanks,
-Uplinger

Thhhhhaaaannnnnkkksssss my dear Erika !
Your job on Twitter is awesome :)
Thanks Keith for pointing this out cool
Tom
Why thank you biggrin
ErikaT
[May 4, 2015 4:00:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
yoro42
Ace Cruncher
United States
Joined: Feb 19, 2011
Post Count: 8979
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No Work Available for ALL Projects

On a personal side, usually I log in and check the grid health every day. Starting thursday of last week, I was spending some time with my twin brother for a long weekend. Since I did not get any alerts due to no checks for the feeder being completely empty as mentioned before, I assumed all was well.

Uplinger,
How systems know that we stop monitoring them we will never know. Given time, code will find a path not accounted for. Fortunately the result was an inconvenience and not a catastrophe.

I trust you had a great weekend with your brother,
Rory
----------------------------------------

[May 4, 2015 7:51:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No Work Available for ALL Projects

On a personal side, usually I log in and check the grid health every day. Starting thursday of last week, I was spending some time with my twin brother for a long weekend. Since I did not get any alerts due to no checks for the feeder being completely empty as mentioned before, I assumed all was well.

Uplinger,
How systems know that we stop monitoring them we will never know. Given time, code will find a path not accounted for. Fortunately the result was an inconvenience and not a catastrophe.

I trust you had a great weekend with your brother,
Rory


Rory, I did have a great weekend with my brother, probably another reason why I stayed disconnected :P

As for the monitoring, the fun part on that is even though you may have something in place, the only way to test it is for there to be an actual issue/error. Which in a production system isn't something you want to test live :) I believe I have a plan going forward to help us with our monitoring and how to properly test it with little to no interruption to the member services.

Thanks again,
-Uplinger
[May 5, 2015 1:40:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 61   Pages: 7   [ Previous Page | 1 2 3 4 5 6 7 | Next Page ]
[ Jump to Last Post ]
Post new Thread