Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 18
Posts: 18   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2516 times and has 17 replies Next Thread
sk..
Master Cruncher
http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif
Joined: Mar 22, 2007
Post Count: 2324
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WU hangs at 99.818% complete

Are files uploading at the time this pause starts?
I noticed this and reported it in the last week or so.
Thought it was specific to one project (not this one).
What I saw was a task starting to upload then the CPU usage drop and stay low for a minute or more, in the middle of the upload.
Might be totally unrelated or of some interest.
[Apr 15, 2011 9:37:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
gamebox
Cruncher
Joined: May 12, 2011
Post Count: 1
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WU hangs at 99.818% complete

I have same problem, 99.818% too. Already waiting about 15 min, but nothing else.
[May 13, 2011 8:44:56 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU hangs at 99.818% complete

The only time I've ever seen HCC stop for longer than the minute or two at 99.818% [which it's done since day 1 as noted], was when the job was preempted.

With Linux I've recently run about 50, but it's the old version I think which simply had a point release sync with the Windows release. Never an issue, and that said, I think Linux overall has the lowest incident rate at WCG.

Leaving no stone unturned, one of the usual suspects being AV, and then setting exceptions for the BOINC datadir and the 5 BOINC elements and the science apps. With Avast 6 not seeing these app specific exceptions as an options, but certainly me Kerio firewall had them to ensure that BOINC and the apps could communicate.

col323 reports of unloading BOINC, e.g. stop service via Task Manager in admin mode and starting it again, then the tasks finishing as normal
And then, if I shutdown and restart, the last checkpoint resets the run time to about 1:40-1:50, which it then completes in a couple minutes.
something that HPF2 had/has been doing at times, but reports on that are very far between. Maybe the techs would be interested to learn of such task names so maybe they can run them in a loop with some debug settings. On the 325,000+ per day, chances of hitting one in the labs is low though, and having the same environment it is happens in.

--//--
[May 13, 2011 11:03:03 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU hangs at 99.818% complete

I just had one happen: 99.818%% progress with 5seconds remaining and stuck there timeRemaining-wise, and progress-wise. I'm not sure when it exactly happened, but since the WU eventually completed after about 1minute of me waiting on it when I noticed the said WU, I did not bother to check details like if it continues to consume CPUcycles, durationOfRun, etc. If while on this stage, the WU "is doing some calculations outside of the main programming loops that update percentage complete", as uplinger [Apr 12, 2011 3:44:26 AM] post puts it, then it is quite harmless.

Now, talking about those outsideOfTheMainProgrammingLoops idea, what else is going on there? May I suggest that, at a cost of some CPUcycles penalty that would have otherwise gone into crunching-only purposes, that this be expanded to do more of the 'clientSide' assessments -- including pre-validation of the completedWU, cruncherMachinePerformance on a per latestCompletedWU basis, etc. I would like to think that there is a lot of room where improvements in the cruncherMachine-to-WCGserver communication would help immensely to head-off many WU-crunching issues and aid in the matching of WU workoads to cruncherMachines.
[Jun 12, 2011 8:23:38 AM]   Link   Report threatening or abusive post: please login first  Go to top 
sk..
Master Cruncher
http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif
Joined: Mar 22, 2007
Post Count: 2324
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WU hangs at 99.818% complete

As Uplinger said "It is doing some calculations outside of the main programming loops that update percentage complete... This is normal behavior though".
Watched this happen for 72sec on one of my systems. No problems, and I have seen this behavior many times on other projects (usually at 100% or close to it).
It's only a problem if it goes on continuously, i.e. the task does not finish (and I have seen that on a few other projects too).

On the OutsideOfTheMainProgrammingLoops explanation, while all the data has to be collated and zipped for uploading I think this is just non-loop processing. It's common that non-loop processes are run towards the end of a task and sometimes just before checkpoint periods. I did not notice the CPU usage drop during this time, so it was clearly doing some calculations, and I did not notice much in the way or I/O overhead or RAM usage changes. Whether the % complete sits at 99.818%, 100% or progresses beyond 100% is of little importance really (and all these are seen in various projects).
[Jun 12, 2011 3:40:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU hangs at 99.818% complete

[rhetorical] With 4 months left of full day work supply on HCC1, if at all, what exactly are we talking about? Don't think the jockey will change the riding gear so close to the end of the steeple chase. Probably the owner of the horse would get fits before that.

--//--
[Jun 12, 2011 3:51:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU hangs at 99.818% complete

The jockey does not change the riding gear so close to the end of the steeple chase for the current competition, it is the horse owner who labors to improve the riding gear for future competitions.
[Jun 12, 2011 4:11:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU hangs at 99.818% complete

... implicitly to what I said, yes that's what horse owners do. What you're citing in your previous post is already done to a good extend to include that the actual performance is reported to the servers [real fpops, real time] and the Duration Correction Factor [DCF] is adjusted on your client upon completion of each task. Some of the validation can't be done on the client and some one would not want to code into the science app. Much easier to make a single change on the Server Validator to add an additional conditional exception. We would not want a wingnut to figure out the mechanics and then knob the output, another reason why in this type of science/database building a quorum 2 is required although skill is advancing to make that less and less a requirement for volunteer crunching on un-controlled devices.

--//--
[Jun 13, 2011 6:56:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 18   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread