Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 1
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2480 times and has 0 replies Next Thread
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
BOINC: Task Appears to be making No Progress / Seems Stuck

Observed with all the platforms HPF2 runs on and occasionally with HFCC & DDD-T as well, and GFAM/DSFL when CPU time is not set to 100%, the task may hang in a loop indefinitely (FA@H is exceptional and will show retreating % progress at times, which is normal!). One or more symptoms are:
  • Using 100% CPU time for a longer than normal period, even when the throttle is set to a lower percent.
  • No Percent progress is shown in the Tasks Tab of BOINC Manager (BM) or the Progress Bar in the Graphics View or Simple View.
  • CPU / run time appears substantially longer in relation to total expected time (e.g. 10 hours and 20%, where normally 2 hours and 40%). Carefully watch the progress percent if it increments in fractions of seconds.

    This is a known, but yet to be identified issue with this science. If the CPU time use seems normal and not even a fractional percent of progress is observed, apply the following work around to get the program out of the almost endless loop:

      • If Leave Application In Memory while Suspended (LAIM) is on, switch it off through the Local Preferences screen (v 5.10 and up, the default being Off)
      • Stop WCG project completely in the Project Tab of the BM Advanced View by selecting WCG and operating the 'Suspend' button in the left margin.
      • Wait 30 seconds to a minute (Watch the Hard Disk light to remain off) to purge the science from memory **
      • Start WCG again in the Project Tab of the BM
        Advanced View , by selecting WCG again and operating the 'Resume' button.
      • If LAIM was on prior to step 1, switch it on again.

    ** On a Multi Processor device, Suspending WCG will unload All WCG sciences in progress from memory.!

    Alternately, if applicable do step 1 above for LIM. Then, through the BM Local Preferences and Suspend computing in the Activity menu. After a few moments, switch the Activity back to Run Always or Run Based On Preferences as applicable and restore the LIM if need be.

    In the above cases it is not required to stop the Service if BOINC was installed that way (Service in the BOINC 6 installer is called protected). If you have to, go to the BM Advanced menu and select "shut down connected client...". Restarting the Service for windows would have to be done through the Services.msc control applet.

    Methods differentiate depending on Operating Systems, using e.g. the BOINCcmd line program.

What happens after the above procedure is that the task will resume from the last good checkpoint that was saved to the harddisk. All computing time from that restore point is lost.

If this situation is not caught from observation, the HPF2 Science application will time-out based on estimated fpops in a task, which was just before this post reduced from factor 10 to 6, as sourced from statistical information. The 6x meaning that the ESTIMATED operations in a job stored in <rsc_fpops_est>, known to vary widely with up to 10x and more are cut off when reaching factor 6, the multiple of <rsc_fpops_est> stored in the <rsc_fpops_bound> variable for a task.

Large completion time fluctuations are normal on 'Non-Deterministic' Computations. The Instances are though so very rare above 6x of the estimated calculations, that it is deemed prudent to not waste extended time, as usually upon reissue to another computer HPF2 will finish in a normally expected time frame. Particular the off-site / unattended clients benefit from this reduction in max flops / time out.
Sample of BOINC Manager Project tab. Note that the function and name of the 'Resume' Button will appear in place of the 'Suspend' button after the first step!



Even if BOINC is installed as service (no graphics available), the BOINC Manager (BOINCmgr.exe), can be started separately. The User Interface makes control significantly more simple than the command line instructions with the boinccmd.exe tool

----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 14 times, last edit by Former Member at Jan 19, 2012 1:46:56 PM]
[Sep 18, 2007 1:25:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread