Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 22
Posts: 22   Pages: 3   [ Previous Page | 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 4274 times and has 21 replies Next Thread
PMH_UK
Veteran Cruncher
UK
Joined: Apr 26, 2007
Post Count: 771
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Strange "running order" of DSFL jobs

In my experience some versions of BOINC can go into panic mode with a high "switch between" value compared to the cache size.
For me the fix was to reduce the "switch between" and update.
The new value takes effect immediately and if low enough BOINC stops panicing and lets the active tasks run to completion followed by any waiting to run.

Paul.
----------------------------------------
Paul.
[Dec 7, 2011 3:42:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
joeperry39@gmail.com
Advanced Cruncher
USA
Joined: Nov 22, 2006
Post Count: 140
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Strange "running order" of DSFL jobs

SekeRob said: System date right is important. If that work was fetched with a wrong system date and it then moved forward it could also have kicked this off [the inflated switch time wont have helped the situation]. Abort all unstarted jobs, reduce cache to 1 day and let it run a few days. If it does not right itself, check in again.

--//--

Prior to KeithHenry posting a suggested solution, I did abort all of the jobs (109 of them, all due on the 15th except 1 for the 16th) and am forcing the previously started jobs to clear out. As soon as they are completed, I'll let everything run normally.

I do believe the "wrong date & time" may well have been the problem. I remember that, at some point, both of those were incorrect, and the date was several days prior to the then current date. When I noticed that problem I immediately corrected the date & time. Problem is I don't remember if BOINC was or was not "up-and-running" at that point.

At any rate, I'll continue with the solution suggested by SekeRob and see what happens. The "started but not completed and then restarted jobs" should clear out later today and everything will then, hopefully, be back to normal.

Thanks to all that replied with suggestions for correcting this problem. I'll report back after a few days as to the then-current status.
----------------------------------------


"Everything in moderation, including moderation" -- Mark Twain
----------------------------------------
[Edit 1 times, last edit by osugrad at Dec 7, 2011 4:28:21 PM]
[Dec 7, 2011 3:51:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Strange "running order" of DSFL jobs

Exactly, the 6000 minutes switch was designed to let Beta/Repairs jump the queue automatically, 18000 minutes (12.5 days) will definitely have the panic persist through total empty state. And when there's all-core panic some versions will stop fetching work until a core goes idle. Aborting is easiest, manure happens, and does not risk the tasks at the end of the line to go overdue anyhow. The wingmen will appreciate it. The daily quota is big enough to not cut anyone out that has the incidental mishap and it does not eat into any reliability rating to get repairs, which of course with a 3 day cache or bigger is nowhere land.

--//--

edit: This was a comment follow-on to PMH_UK's post. Had not seen the osugrad's reply till now.
----------------------------------------
[Edit 1 times, last edit by Former Member at Dec 8, 2011 12:14:52 PM]
[Dec 7, 2011 4:00:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
depriens
Senior Cruncher
The Netherlands
Joined: Jul 29, 2005
Post Count: 350
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Strange "running order" of DSFL jobs

It sometimes happens (out-of-the-blue) to some of my machines as well, particularly to those with a larger workunit cache. BOINC then pauses the units running and starts running other units with a later deadline. It then runs them for an hour and then pauses them to start some other units. This will go on until eventually all units are finished.

_IF_ I notice it, I manually pause all units not started yet until most halfway units are finished. Then I unpause all units. The workfetch resumes, BOINC finishes the units it's working on at that moment and then continues in the correct sequence.
----------------------------------------

[Dec 8, 2011 8:24:46 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Strange "running order" of DSFL jobs

it only happens to my i7 when i set the cache to 1.20 or higher
[Dec 8, 2011 12:08:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Strange "running order" of DSFL jobs

Makes no sense in the least, unless running with keithhenry's 18000 minute switch or similar. Why BOINC does not have the lucency to read and execute as: Hey, total TTC of running and ready to start tasks is less than any deadline of task in cache, just continue normal... beyond me. Ingleside may have the answer why BOINC acts irrational on that.

For sure, it's possible that events such as a stuck WU, restarted and completed with huge elapsed but normal CPU time kicks the inflation off, but however big the inflation, 1.2 days cache is still 1.2 days cache. Work fetch suspends until it sinks below that value. Do check task properties how Elapsed/CPU time relate. Has the lost CPU time devil returned for some? It's one of the reasons why I barely touch the standard BM and only use BOINCTasks. This tool shows both values side by side, so it's easy to follow if there is a stalling/performance issue on CPU time side, and certainly great to follow each tasks efficiency.

(I'm still presuming all these comments and observations are from known stable BOINC versions and not any in-between alpha/beta).

--//--

Edit: a comma
----------------------------------------
[Edit 1 times, last edit by Former Member at Dec 8, 2011 12:30:07 PM]
[Dec 8, 2011 12:27:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Strange "running order" of DSFL jobs

boinc 6.10.60 elapse time usually 1 - 2 min swich between apps 7200
running cep2 gfam dsfl on me i7
[Dec 8, 2011 2:36:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Strange "running order" of DSFL jobs

Well, the combination of variable time with this 7200 minute switch is what it does. Since the Beta/Repair are 4 days deadline and BOINC using a safety margin on top of hours to a day [varies per version], try 4400 min. Cache plus switch time is then adding to > 5760 minutes. Since you run a cache of 1.2 days, your device if stable is getting repairs which carry... 4 days deadline.

Let us know if the EDF still hits when those 4 days deadline arrive, but not when 7 or 10 days, with that 1.2 days cache of course.

--//--

edit: math adj.
----------------------------------------
[Edit 1 times, last edit by Former Member at Dec 8, 2011 3:57:59 PM]
[Dec 8, 2011 3:12:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
pcwr
Ace Cruncher
England
Joined: Sep 17, 2005
Post Count: 10903
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Strange "running order" of DSFL jobs

Mine has been doing it for the last few days (I had 6 days cache to do in only 7 calendar days). Some how it works it out to meet the deadline for all the WUs. All my DDDT2 WUs got in in time. Hopefully all my HCMD2 will as well.

Cache set back to 2 days until Beta comes again.

Patrick
----------------------------------------

[Dec 8, 2011 3:17:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Strange "running order" of DSFL jobs

Errata: I've actually backed off from the long switch time on a permanent basis. I'll only activate this trick when Beta's are announced. I don't like [detest] racing through the repairs and finding out that the original still arrived within the grace period (lasts as long as the validated results show on the RS pages... are "live"). By letting repairs complete normal pace, the original has about a day extra (my cache size), and WCG gets the chance to tell my client to abort the redundant task.

--//--
[Dec 8, 2011 3:18:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 22   Pages: 3   [ Previous Page | 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread