Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 18
|
![]() |
Author |
|
Col323
Senior Cruncher Joined: Nov 4, 2008 Post Count: 372 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I did a quick look back through the past year of threads and didn't see anything in relation to this. Forgive me if this is a known problem.
I have one computer which runs HCC exclusively. Every now and then it will hang at 99.818% complete. I've seen work stall and continue increment time for over a day. My solution is to close the Boinc manager, make sure all processes are shutdown, then reopen Boinc. It will restore from the checkpoint and happily complete. I have never seen this behavior on any other computer. I can't pin this down to a specific behavior or pattern. There's no "uh oh, it hung on a WU, I'd better reboot before I start getting a stream of hung WUs." A WU could hang 5 mins after a fresh boot, and then not hang again for weeks. I only bring this up now because after the impressive improvement for Windows crunchers, this computer effectively doubled its chances of hanging on a WU. If it's a known issue, perhaps someone could point me in the right direction. I'm guessing it's just a unique behavior, in which case I can do my best to keep an eye on it! ![]() This is on an AMD 3800x2 running Win XP 32 bit. I fully realize this could be caused by antiquated hardware protesting its continued service. Since its sole purpose is to sit in the corner and slice through cancer WUs at 1:45 each, I'm reluctant to shut it down. |
||
|
sk..
Master Cruncher http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif Joined: Mar 22, 2007 Post Count: 2324 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Boinc Version?
----------------------------------------Did you check your log files, just in case a system process or another app is causing this? Start, Run, type eventvwr.exe [Edit 1 times, last edit by skgiven at Apr 11, 2011 6:54:15 PM] |
||
|
z2000
Advanced Cruncher Joined: Feb 27, 2011 Post Count: 116 Status: Offline |
Mine did that too just now, it paused at 99.818% complete, with 26 seconds remaining. The pause lasted for 5 minutes but it did complete and upload.
----------------------------------------I have boinc 6.10.58, so I don't think it's boinc. I think it's just the way the HCC program runs. ![]() |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Greetings,
The workunit is actually not hanging. It is doing some calculations outside of the main programming loops that update percentage complete. I actually just noticed this on my machine but it only took 2 minutes to complete this section of code. This is normal behavior though. Thanks, -Uplinger |
||
|
Col323
Senior Cruncher Joined: Nov 4, 2008 Post Count: 372 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'm running Boinc 6.10.58. I've not checked my log files on this.
I don't know that I'd call it normal behavior as the WU will sit on 99.818% complete for hours on end. (Last episode it was at 15 hours and I've seen it over 24 hours before.) And then, if I shutdown and restart, the last checkpoint resets the run time to about 1:40-1:50, which it then completes in a couple minutes. But if I'm the only one having this problem, I can deal with it. :-) |
||
|
armstrdj
Former World Community Grid Tech Joined: Oct 21, 2004 Post Count: 695 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Col323,
I just want to verify a couple of things to make sure I understand. 1. This isn't a new occurence with the latest update that just went in to production, it has always occurred on this machine with HCC? 2. However with the recent update it seems to occur more frequently? When this happens is the process continuing to use cpu time? Have you run any test on your memory or disk on this machine? Thanks, armstrdj |
||
|
Col323
Senior Cruncher Joined: Nov 4, 2008 Post Count: 372 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
1. Correct. It's happened all along on this machine. Nothing new here.
2. I wouldn't say it occurs more frequently, just that the machine can now complete about 2x the units it did prior to the update. To me, that's 2x the chances for whatever fluke for this to occur and hang. I'll have to double check to see if it's using CPU time. I believe it is not, but next time it happens I'll be sure. Usually the OCD in me sees wasted CPU time and quickly shuts down Boinc. ![]() I have not recently run any tests. I wouldn't be too surprised if the disk controller is dying. I had a similar board lose a disk controller a couple years ago - although the errors were much more severe and noticable. ![]() Judging from the responses here, I'd say the trouble lies on my end. Thanks for all the input, and hopefully the techs can spend time on something much more important than my decrepit hardware. And for me, it's just further motivation for some new hardware. ![]() |
||
|
KWSN - A Shrubbery
Master Cruncher Joined: Jan 8, 2006 Post Count: 1585 Status: Offline |
This is total speculation but I'd go with the disk controller as you suggested. At 99.818 the work unit has basically finished computing and is merely working with the files to prepare them for transmittal to the servers. My guess is something is hanging when it goes to compress the files and they're getting stuck between finished computing and closing.
----------------------------------------![]() Distributed computing volunteer since September 27, 2000 |
||
|
z2000
Advanced Cruncher Joined: Feb 27, 2011 Post Count: 116 Status: Offline |
Boinc Version? Did you check your log files, just in case a system process or another app is causing this? Start, Run, type eventvwr.exe Good info, thanks. I ran this event viewer and that is how I noticed that there were no errors logged for a day in which I noticed this completion pause. Col323, did you try looking for the error log? ![]() |
||
|
Col323
Senior Cruncher Joined: Nov 4, 2008 Post Count: 372 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I just had it happen, so I thought I'd answer a couple outstanding questions.
----------------------------------------1. I checked the task manager and that cancer process is idle. Time continues to increment. This morning it was at 5 hours when a WU should complete in the 1:45 to 2:00 range. /edited to add: After a restart, time reset and the WU did complete in 1:46:41. 2. I checked eventvwr.exe. There were no entries under any of the tree for the past 24 hours. [Edit 1 times, last edit by Col323 at Apr 15, 2011 10:09:18 AM] |
||
|
|
![]() |