Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Support Forum: BOINC Agent Support Thread: WU's getting stuck and not completing |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 5
|
Author |
|
stoneysilence
Cruncher Joined: May 2, 2007 Post Count: 10 Status: Offline Project Badges: |
Since the server migration, I have been having a lot of issues with WU's getting stuck at 99%+some or 100% but still "running" but they are not using any cpu time. I've been running BOINC/WCG for over 10 years and this is the first time i've had any issues.
----------------------------------------I can "Abort" the stuck WU's but then I lose the time I spent on them and they don't get done. And it's not very long before I will get another stuck WU. I don't want to be checking BOINC every day for stuck WU's. Things I have done. Uninstalled BOINC. Reinstalled BOINC. Cleaned WCG Cache. Stopped all new WU's, finished them up and then removed WCG and added it again. Specs: Windows 10 AMD FX-8370 OC'ed 4.2ghz Corsair H100i CPU Cooler AMD R9 280x 16GB Ram 512MB SSD 2 TB HDD I have attached screenshots of my WU's stuck and one of the details of the stuck work units and one of my task manager. https://www.dropbox.com/s/s19lg3dwna65gew/Scr...06-04%2023.03.52.jpg?dl=0 https://www.dropbox.com/s/et5dnputk1a3e14/Scr...06-04%2023.04.15.jpg?dl=0 https://www.dropbox.com/s/sc2x7fztaapf53h/Scr...06-04%2023.04.42.jpg?dl=0 https://www.dropbox.com/s/okiyunpth42s2mi/Scr...06-04%2023.14.47.jpg?dl=0 [Edit 3 times, last edit by stoneysilence at Jun 5, 2017 6:30:37 AM] |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Am running 7.6.33 64 bit in a W8.1, W10 environments without issue, and FAIK, you're the first to report the matter, and it going across multiple sciences. Pretty sure the correlation, of it being post-migration related, is not causation. Usually getting stuck units to run again is by stopping the tasks in question for 30 seconds (With 'Leave non-GPU tasks in memory while suspended' off), and then resuming them again. You may verify in task manager these task(s) have left memory, they must, else they remain stuck. Certainly what you did to resolve the matter has erased all traces of a culprit, if it were any piece of BOINC.
When a task finishes, it wishes to do a little housekeeping, zipping up result files and such, the question then arising if something on the host is blocking that. E.g. check the security software, noting WCG swapped IP address during the migration, but you don't seem to have upload/download problems. Strongly recommend to set a scanning exception in the AV for the BOINC data directory and it's subs [Is Sandboxed], usually C:\ProgramData\BOINC. As for the uninstall / reinstall, it's important to boot between these two steps, as special 'limited rights' boinc accounts are being created by the installer as part of the sandboxing. Uninstalling and then not booting does not remove those from Windows memory i.e. potentially still a polluted environment. Momentarily can't think of other things to check. |
||
|
stoneysilence
Cruncher Joined: May 2, 2007 Post Count: 10 Status: Offline Project Badges: |
I think I fixed it. I had to uncheck leave in memory like you said. Then I suspended all tasks except the stuck ones. I forced Boinc to shut down (not just the gui). I checked my task manager and made sure everything was gone. However even after all that there were still 4 tasks for boinc in task manager. I forced exit those tasks. I then restarted boinc and it started working on those stuck ones again and completed the two 100% almost immediately and started finishing the others.
|
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Stranger things happen, but science app processes getting orphaned is the strangest. My recommendation stands on setting scan excludes on the BOINC data dir, which does not stop in-memory scannin. Whilst, suspending the stuck tasks through BOINC Manager to see with LAIM off, would have proven if the client was still in control of those processes, without having to exit BOINC.
Right now the immediate issue is solved, but not the root cause of why they get stuck, but am sure it's something in your local environment. Report back is these stuckers return. |
||
|
stoneysilence
Cruncher Joined: May 2, 2007 Post Count: 10 Status: Offline Project Badges: |
Ok, so far no new stuck WU's. I put in an exception in ESET to exclude the Boinc data directory.
|
||
|
|