Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 143
|
![]() |
Author |
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
What is going on? After 2 hrs of Uploads and Downloads restarting, the workdone field of the Project view has not updated since yesterday. Normally through the course of the day it updates as work is uploaded. It will resume, the system just needs a little time to catch up on things. |
||
|
DCS1955
Veteran Cruncher USA Joined: May 24, 2016 Post Count: 668 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks.
----------------------------------------![]() ![]() |
||
|
mmonnin
Advanced Cruncher Joined: Jul 20, 2016 Post Count: 148 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Looks like all mine have uploaded.
----------------------------------------For those that had CPUs go idle, it's a good idea to have a backup project set to 0% Share so they download tasks only when needed. All projects go down, have maintenance or just run out of work. It happens. ![]() |
||
|
jhindo
Former World Community Grid Admin Joined: Aug 25, 2009 Post Count: 250 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thank you all for remaining patient during this outage. Happy to report the system recovery is complete.
As mentioned in one of our earlier updates, it appears that there were some incorrect configuration options in our new server setup that made these types of events more likely to occur. We are working to fix these configuration options, which will give us better stability going forward. One of the reasons for the delay was that we wanted to make sure that the research data were completely recovered. We are happy to report that we did not lose any data. Thanks again for your support, Juan |
||
|
3A4scLiRhJVcdT2K9q9kQNxzxYJ9
Advanced Cruncher Joined: Nov 16, 2009 Post Count: 72 Status: Offline |
^ Thanks for the update
|
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Post Mortem (something you do after a recovery): Of 260 results, all FAHV, 3 went Invalid, across 2 machines, 2 timestamped as received the second the BOINC services went out at 15:58:34 UTC. This kicked the remaining 257 of mostly Outcome 0 into Pending Verification state, which now all are waiting on a new wingman, i.e. 257 due 'coincidence' in order of processing got a wingman.
----------------------------------------In this particular case, as FAHV lost machines reliability, deselect the science and wait till all PVer have validated, or maybe some more turn invalid too, to set off a reset of the consecutive valid counter. Lesson drawn: When it says maintenance scheduled, no matter the 'you dont have to do anything', take clients offline well before, to well after. Unscheduled, take clients off line and wait till green light comes on. Data may not have been lost, the processing created loss in form of wasteful duplication. In Don speak, trivial on the whole. [Edit 1 times, last edit by SekeRob* at Jul 20, 2017 7:37:33 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
And I'm surprised not to have seen any Server Aborted cases for all the units that had extra (unnecessary) copies sent out because of the recent problem. Does this still need some tech attention?
e.g. Project Name: Smash Childhood Cancer Created: 07/17/2017 19:26:31 Name: SCC1_0000573_Bct-E_17534 Minimum Quorum: 2 Replication: 3 Result Name OS type OS version App Version Number Status Sent Time Time Due / Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit SCC1_ 0000573_ Bct-E_ 17534_ 2-- Microsoft Windows 7 x64 Edition, Service Pack 1, (06.01.7601.00) - In Progress 7/20/17 01:54:34 7/30/17 01:54:34 0.00 0.0 / 0.0 SCC1_ 0000573_ Bct-E_ 17534_ 3-- Microsoft Windows 10 Core x64 Edition, (10.00.14393.00) 708 Valid 7/20/17 01:54:34 7/20/17 06:51:26 0.89 21.5 / 22.5 SCC1_ 0000573_ Bct-E_ 17534_ 4-- Microsoft Windows Server 2008 "R2" Enterprise x64 Edition, Service Pack 1, (06.01.7601.00) - In Progress 7/20/17 01:54:33 7/30/17 01:54:33 0.00 0.0 / 0.0 SCC1_ 0000573_ Bct-E_ 17534_ 1-- Microsoft Windows 10 Professional x64 Edition, (10.00.10586.00) 708 Valid 7/18/17 02:05:18 7/18/17 07:34:04 0.45 24.3 / 22.5 SCC1_ 0000573_ Bct-E_ 17534_ 0-- Microsoft Windows 7 Professional x64 Edition, Service Pack 1, (06.01.7601.00) 708 Valid 7/18/17 02:05:09 7/18/17 16:23:08 0.73 20.7 / 22.5 |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Inventory says 27 were server aborted, exit code 202, with a validation state 2, which is Invalid, across all the crunched sciences (zika, scc1, fahv). These look to be 1 of a pair of verification copies against zero redundant results by other members, the pair sent out AFTER the service resume. ValidationState=2 bugs me greatly, regardless if server aborted, no processing time i.e. unstarted.
----------------------------------------[Edit 1 times, last edit by SekeRob* at Jul 20, 2017 8:10:22 AM] |
||
|
TonyEllis
Senior Cruncher Australia Joined: Jul 9, 2008 Post Count: 261 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hmm - Qty 118 'Server Aborts' so far and still happening and counting up... these were all downloaded earlier today... waste of bandwidth at both ends...
----------------------------------------
Run Time Stats https://grassmere-productions.no-ip.biz/
|
||
|
nivrip
Senior Cruncher North Yorkshire Joined: Sep 13, 2007 Post Count: 264 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Things started moving about 11 hours ago, mainly uploads. Then over the next 4-5 hours started getting downloads in stages. Seems to be back to normal now.
----------------------------------------Great stuff, Techies. ![]()
ЮРКШИР КРУНЧЕР
|
||
|
|
![]() |