World Community Grid - View Thread - Strange "running order" of DSFL jobs

World Community Grid Forums

Category: Completed Research

Forum: Drug Search for Leishmaniasis Forum

Thread: Strange "running order" of DSFL jobs

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 22

[ ]

Author

This topic has been viewed 4274 times and has 21 replies

PMH_UK
Veteran Cruncher
UK
Joined: Apr 26, 2007
Post Count: 771
Status: Recently Active
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

1 year badge for Discovering Dengue Drugs - Together

2 year badge for Nutritious Rice for the World

1 year badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

180 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

10 year badge for Uncovering Genome Mysteries

10 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

10 year badge for Smash Childhood Cancer

10 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Strange "running order" of DSFL jobs

In my experience some versions of BOINC can go into panic mode with a high "switch between" value compared to the cache size.
For me the fix was to reduce the "switch between" and update.
The new value takes effect immediately and if low enough BOINC stops panicing and lets the active tasks run to completion followed by any waiting to run.

Paul.

----------------------------------------

Paul.

[Dec 7, 2011 3:42:40 PM]

joeperry39@gmail.com
Advanced Cruncher
USA
Joined: Nov 22, 2006
Post Count: 140
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

90 day badge for Discovering Dengue Drugs - Together

180 day badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

1 year badge for Help Fight Childhood Cancer

14 day badge for Influenza Antiviral Drug Search

1 year badge for Help Cure Muscular Dystrophy - Phase 2

90 day badge for The Clean Energy Project - Phase 2

1 year badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

1 year badge for Computing for Sustainable Water

2 year badge for Uncovering Genome Mysteries

14 day badge for Outsmart Ebola Together

5 year badge for Microbiome Immunity Project

5 year badge for OpenPandemics - COVID-19


Re: Strange "running order" of DSFL jobs

SekeRob said: System date right is important. If that work was fetched with a wrong system date and it then moved forward it could also have kicked this off [the inflated switch time wont have helped the situation]. Abort all unstarted jobs, reduce cache to 1 day and let it run a few days. If it does not right itself, check in again.

--//--

Prior to KeithHenry posting a suggested solution, I did abort all of the jobs (109 of them, all due on the 15th except 1 for the 16th) and am forcing the previously started jobs to clear out. As soon as they are completed, I'll let everything run normally.

I do believe the "wrong date & time" may well have been the problem. I remember that, at some point, both of those were incorrect, and the date was several days prior to the then current date. When I noticed that problem I immediately corrected the date & time. Problem is I don't remember if BOINC was or was not "up-and-running" at that point.

At any rate, I'll continue with the solution suggested by SekeRob and see what happens. The "started but not completed and then restarted jobs" should clear out later today and everything will then, hopefully, be back to normal.

Thanks to all that replied with suggestions for correcting this problem. I'll report back after a few days as to the then-current status.

----------------------------------------

"Everything in moderation, including moderation" -- Mark Twain

----------------------------------------
[Edit 1 times, last edit by osugrad at Dec 7, 2011 4:28:21 PM]

[Dec 7, 2011 3:51:26 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Strange "running order" of DSFL jobs

Exactly, the 6000 minutes switch was designed to let Beta/Repairs jump the queue automatically, 18000 minutes (12.5 days) will definitely have the panic persist through total empty state. And when there's all-core panic some versions will stop fetching work until a core goes idle. Aborting is easiest, manure happens, and does not risk the tasks at the end of the line to go overdue anyhow. The wingmen will appreciate it. The daily quota is big enough to not cut anyone out that has the incidental mishap and it does not eat into any reliability rating to get repairs, which of course with a 3 day cache or bigger is nowhere land.

--//--

edit: This was a comment follow-on to PMH_UK's post. Had not seen the osugrad's reply till now.

----------------------------------------
[Edit 1 times, last edit by Former Member at Dec 8, 2011 12:14:52 PM]

[Dec 7, 2011 4:00:07 PM]

depriens
Senior Cruncher
The Netherlands
Joined: Jul 29, 2005
Post Count: 350
Status: Offline
Project Badges:

10 year badge for Human Proteome Folding - Phase 2

2 year badge for Discovering Dengue Drugs - Together

5 year badge for Help Fight Childhood Cancer

1 year badge for Influenza Antiviral Drug Search

2 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

1 year badge for Africa Rainfall Project

2 year badge for OpenPandemics - COVID-19


Re: Strange "running order" of DSFL jobs

It sometimes happens (out-of-the-blue) to some of my machines as well, particularly to those with a larger workunit cache. BOINC then pauses the units running and starts running other units with a later deadline. It then runs them for an hour and then pauses them to start some other units. This will go on until eventually all units are finished.

_IF_ I notice it, I manually pause all units not started yet until most halfway units are finished. Then I unpause all units. The workfetch resumes, BOINC finishes the units it's working on at that moment and then continues in the correct sequence.

----------------------------------------

[Dec 8, 2011 8:24:46 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Strange "running order" of DSFL jobs

it only happens to my i7 when i set the cache to 1.20 or higher

[Dec 8, 2011 12:08:44 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Strange "running order" of DSFL jobs

Makes no sense in the least, unless running with keithhenry's 18000 minute switch or similar. Why BOINC does not have the lucency to read and execute as: Hey, total TTC of running and ready to start tasks is less than any deadline of task in cache, just continue normal... beyond me. Ingleside may have the answer why BOINC acts irrational on that.

For sure, it's possible that events such as a stuck WU, restarted and completed with huge elapsed but normal CPU time kicks the inflation off, but however big the inflation, 1.2 days cache is still 1.2 days cache. Work fetch suspends until it sinks below that value. Do check task properties how Elapsed/CPU time relate. Has the lost CPU time devil returned for some? It's one of the reasons why I barely touch the standard BM and only use BOINCTasks. This tool shows both values side by side, so it's easy to follow if there is a stalling/performance issue on CPU time side, and certainly great to follow each tasks efficiency.

(I'm still presuming all these comments and observations are from known stable BOINC versions and not any in-between alpha/beta).

--//--

Edit: a comma

----------------------------------------
[Edit 1 times, last edit by Former Member at Dec 8, 2011 12:30:07 PM]

[Dec 8, 2011 12:27:22 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Strange "running order" of DSFL jobs

boinc 6.10.60 elapse time usually 1 - 2 min swich between apps 7200
running cep2 gfam dsfl on me i7

[Dec 8, 2011 2:36:44 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Strange "running order" of DSFL jobs

Well, the combination of variable time with this 7200 minute switch is what it does. Since the Beta/Repair are 4 days deadline and BOINC using a safety margin on top of hours to a day [varies per version], try 4400 min. Cache plus switch time is then adding to > 5760 minutes. Since you run a cache of 1.2 days, your device if stable is getting repairs which carry... 4 days deadline.

Let us know if the EDF still hits when those 4 days deadline arrive, but not when 7 or 10 days, with that 1.2 days cache of course.

--//--

edit: math adj.

----------------------------------------
[Edit 1 times, last edit by Former Member at Dec 8, 2011 3:57:59 PM]

[Dec 8, 2011 3:12:19 PM]

pcwr
Ace Cruncher
England
Joined: Sep 17, 2005
Post Count: 10903
Status: Offline
Project Badges:

14 day badge for Help Cure Muscular Dystrophy

14 day badge for Discovering Dengue Drugs - Together

45 day badge for Nutritious Rice for the World

180 day badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

10 year badge for Mapping Cancer Markers

1 year badge for Uncovering Genome Mysteries

2 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project


Re: Strange "running order" of DSFL jobs

Mine has been doing it for the last few days (I had 6 days cache to do in only 7 calendar days). Some how it works it out to meet the deadline for all the WUs. All my DDDT2 WUs got in in time. Hopefully all my HCMD2 will as well.

Cache set back to 2 days until Beta comes again.

Patrick

----------------------------------------

[Dec 8, 2011 3:17:16 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Strange "running order" of DSFL jobs

Errata: I've actually backed off from the long switch time on a permanent basis. I'll only activate this trick when Beta's are announced. I don't like [detest] racing through the repairs and finding out that the original still arrived within the grace period (lasts as long as the validated results show on the RS pages... are "live"). By letting repairs complete normal pace, the original has about a day extra (my cache size), and WCG gets the chance to tell my client to abort the redundant task.

--//--

[Dec 8, 2011 3:18:05 PM]

[ ]