Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 49
|
![]() |
Author |
|
keithhenry
Ace Cruncher Senile old farts of the world ....uh.....uh..... nevermind Joined: Nov 18, 2004 Post Count: 18665 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.... or try this option by inserting it into the cc_config.xml <work_request_factor>2<work_request_factor> The amount of work requested from projects will by multiplied by this number. Use a number larger than one if your computer often runs out of work. Sek, my initial reaction is that you're flirting with danger using that. It would seem to be all too easy to pull down more work than you can complete by the deadline. With the way the computational_deadline affects if you get new work, I suspect that what you would see in practice is that you'd get that factor much more work when the computational deadline allowed you to get new work. That would tend to bunch up your report deadlines where if you miss that on one, you'd miss instead on several. |
||
|
keithhenry
Ace Cruncher Senile old farts of the world ....uh.....uh..... nevermind Joined: Nov 18, 2004 Post Count: 18665 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No, you're not senile. Remember that 5.8.9 is still not "officially" released and it's in test. Just last night, I found a bug in the scheduler and David fixed it and got 5.8.11 out this morning. I don't want to send everyone off jumping up to 5.8.11 but I would say if you have problems with 5.8.8 keeping work and problems with 5.8.9 refusing to actually run work then you could *try* 5.8.11. But, it has *just* come out and there could be bugs. But, I guess you could go back to 5.4.11 as well. - John Watzke Well, it's one of those things that happens but you're not sure if you're doing something wrong or if its the code changes. Since the changes with the work fetch policy and the scheduler appear rather significant, the chance of a bug would seem higher but also the chance that I'm not understanding the changes under the covers. It appears that BOINC is try to move in a direction that it tries to "help" the user who, for whatever reason, gets into "trouble" and has WUs at risk of missing deadlines. It seems determined that if you have a WU that is at risk of missing the report deadline, it's going to try to do whatever it can to get that one completed before that happens, even if that means going without any new work for some number of days and having your queue of WUs worked down to just that one WU. What I don't see is much if anything in the way of communications/indications to the user to make them aware of that condition. Before, you could see that you were overcommitted in the messages. Since now asking for more work in that situation simply doesn't get more work, a user that is used to getting new work with an update request has no indication of why reality didn't meet their expectations. Most folks are going to be inclined to blame BOINC in that case, that it's not working right. Since you could have 10 or even 20 WUs in your queue and the one WU that is keeping you from getting new work isn't necessarily going to be the oldest (though it most likely would be if you're a single project cruncher), could BOINC Manager provide some simple way of flagging that WU? Perhaps display it on the Task tab in red? A simple version of the robot waving its arms yelling "Warning! Warning! Danger Will Robinson!" ![]() |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
.... or try this option by inserting it into the cc_config.xml <work_request_factor>2<work_request_factor> The amount of work requested from projects will by multiplied by this number. Use a number larger than one if your computer often runs out of work. Sek, my initial reaction is that you're flirting with danger using that. It would seem to be all too easy to pull down more work than you can complete by the deadline. With the way the computational_deadline affects if you get new work, I suspect that what you would see in practice is that you'd get that factor much more work when the computational deadline allowed you to get new work. That would tend to bunch up your report deadlines where if you miss that on one, you'd miss instead on several. Depends on the size of the task buffer. At 2,5 days and a deadline of 7 days, it should not put u in harms way with a 30 WU per core / per day limit. It's one of those manual emergency settings: U add it, u manage it!
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 2 times, last edit by Sekerob at Feb 10, 2007 10:45:25 AM] |
||
|
David Autumns
Ace Cruncher UK Joined: Nov 16, 2004 Post Count: 11062 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
it's been like this since BOINC was first introduced onto the WCG this is not a new 5.8.8 issue
----------------------------------------For my sins I have 5 boxes on BOINC and it appears to be completely random as to which box suffers from this symptom 1 or 2 of my boxes were running on the just in time fashion before the 5.8.8 upgrade and after the recent BOINC issues I had updated my cache to get 5 days worth just in case. After the 5.8.8 upgrade they all went out and got 5 days worth and I though Great. Now a week later 3 of the 5 are still OK but 2 of my machines in one instance has only the current WU available and another just a days worth in reserve I've lived with this since the start nothing's really changed with 5.8.8 I'd upgrade if you haven't already Dave ![]() |
||
|
keithhenry
Ace Cruncher Senile old farts of the world ....uh.....uh..... nevermind Joined: Nov 18, 2004 Post Count: 18665 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Actually, when I found the DCF fix in 5.8.9 that I mentioned earlier in this thread, I upgraded to that. With the new scheduler and work fetch policy in 5.8.x, using that option seems to be a lot riskier to me, even if it is not new. It appears that all you need is one WU beyond the computational deadline to stop you from getting any new work until it completes. There is NOT any easy way of identifying that WU either. Normally, you'd expect it to be the oldest WU but with WCG's multiple projects, I'm not so sure you can count on that. I'd like to be wrong though and, given WCG's seven day reporting deadline, be able to say that you don't want any incomplete WU's received more than three days ago. Taking the new scheduler and work fetch policy, combining that with the desire to maintain a queue to avoid any crunching downtime, it would seem like the ideal would be that each time you return X hours of work, you get X hours of new work. If a WU in your queue hits the computational deadline and new work fetch gets suspended, it would seem that you have a good probability that, by the time that WU completes, another WU in the queue behind it will also be in the same state so work fetch will remain suspended until you clear out your queue. At that point, it will then fill back up but now you have Z days worth of work all due at the same time in seven days. Just the opposite of the evenly spread distribution of report deadlines that you ideally want. I'm still sorting this all out but if there are bugs in the scheduler, those could make the situation even worse. Trouble is, when I experiement with some part of my queue, I have to let that part complete before I can try again. Twice now, I have seen BOINC Manager start a 'Ready to run' WU over an earlier "Waiting to run" nee preempted WU. I've also seen two instances where a WU was in Running status but actually BOINC was idle and one instance where I had TWO WU's in Running status on a single core machine. Hopefully, I'll be able to get a point where I have a scenario of steps A, B, and C give X where X does not appear to be a correct state and it can be determined if that is a bug or not.
---------------------------------------- |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
.... or try this option by inserting it into the cc_config.xml <work_request_factor>2<work_request_factor> The amount of work requested from projects will by multiplied by this number. Use a number larger than one if your computer often runs out of work. Sek, my initial reaction is that you're flirting with danger using that. It would seem to be all too easy to pull down more work than you can complete by the deadline. With the way the computational_deadline affects if you get new work, I suspect that what you would see in practice is that you'd get that factor much more work when the computational deadline allowed you to get new work. That would tend to bunch up your report deadlines where if you miss that on one, you'd miss instead on several. Depends on the size of the task buffer. At 2,5 days and a deadline of 7 days, it should not put u in harms way with a 30 WU per core / per day limit. It's one of those manual emergency settings: U add it, u manage it! This setting will get you in trouble the same way a large queue setting will. Once it downloads work the client will do the same estimating on how long it will take and suspend work fetch from the problem project until the work queue is reduced to a managable size. |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
There are so many permutations and possibilities that it cant be said in absolute, but can definitely say: Never missed a deadline when on site.
----------------------------------------Anyway i like experimenting without changing client every 5 minutes so set this variable in the CC_config.xml forced BOINC to read it and let it sit for 48 hours. To back up a bit: This machine has been on 5.8.8 for weeks and now on 5.8.9 for afair part of the week, buffer on 2.5, permanent connection and an ONLY FAAH diet with highly variable length and operating on 10 in 10 out and not fetching work either until the very last job is started or even 1 core was idling. No other projects, 100% WCG. Now after 2 days and another 17 done, BOINC woke up and send a message to fetch 4.04 million seconds of work and got 10 more.....first time in a month 13 FAAH's in buffer, which with length dependence will finish by Tuesday morning - Where the deadline is the 19th! DCF is 1.3 and all other client state readings indicate normality including the fractions (). Given that the single core p4 behaves normal and trickles, i have a stronger believe that on a C2D where 1 core is fully dedicated to crunching and the other 'used' quite actively besides crunching, it forming a problem for the new WFP algorithms. The fetch problems only started with the 5.8 development versions and persist thru 5.8.9 BOINCview, my beloved crunching companion computed it got 1d23h05m36s buffer this moment after just sending off, yes it was a long one again 8:05 hr FAAH. The next one finishing will be under 6! Figure what that does to the DCF :O (advise not to use a crowbar to open up my mouth) Luctor et Emergo (i struggle and will conquer)
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Feb 13, 2007 3:03:51 AM] |
||
|
keithhenry
Ace Cruncher Senile old farts of the world ....uh.....uh..... nevermind Joined: Nov 18, 2004 Post Count: 18665 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I am getting VERY LEERY of the entire 5.8.x code stream. Couple of days ago, I dropped my "Connect to server" setting to 1.0 (was 3.0) and haven't tried testing anything more to let things crunch thru and clear out and WU's that were in my queue previously. I did reboot my machine today but BOINC did start up and has crunched at least one WU to completion since then. All I've done today with the client is hit the Update key some - it's a handy way to see if any Pending Validation WUs have completed. Noticed a few hours ago that all crunching was suspended. Checked messages and saw that benchmarks were running. Made a mental note that I probably wanted to force them to run again late tonight so they wouldn't be running in the middle of the day. Benchmarks completed and it went back to crunching the WU it was on before. Just noticed that my times weren't updating again even though the WU was showing Running status. It was the first WU in my queue and has a report deadline of 02/19/2007 2:59:32 PM. Computational deadline should not be an issue there. I have five other WU in the queue after it all with report deadlines after it. All five are in Ready to Start status. BOINC was not crunching the WU at all despite the Running status. I shutdown and restarted BOINC and now it's crunching again. I only lost a couple of hours of crunching time but when real life allows, I expect that I'm gonna call it quits on the entire 5.8.x code stream for at least 3-4 more releases until what are clearly serious and major problems are worked out. Looks like I'll be back at 5.4.11 for a good while. At this point, I think anyone running any 5.8.x level needs to keep a very close eye on your machines. I won't be recommending anyone upgrade for a while.
---------------------------------------- |
||
|
Diana G.
Master Cruncher Joined: Apr 6, 2005 Post Count: 3003 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'm cheering you guys on!
----------------------------------------![]() Diana G. ![]() |
||
|
keithhenry
Ace Cruncher Senile old farts of the world ....uh.....uh..... nevermind Joined: Nov 18, 2004 Post Count: 18665 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks Diana but cheer Sek on. I've thrown in the towel. Just when I think I'm beginning to get a handle on the new work fetch policy, I'm finding more than enough problems with the new scheduler. The worst is having BOINC report that you're actively crunching when if fact you're sitting dead in the water. I see that 5.8.11 is out. I'd expect to see quite a few point releases for a while so I'm going back and staying on nice, stable 5.4.11 until the "Martian dust storm" (the kind that can engulf the entire planet) settles down. The changes to the work fetch policy and the scheduler aren't just major. If you ask me, I would classify then as complete re-writes. That sort of thing is going to have more than it's share of problems and it's clearly going to take time to resolve them all and get a clear understanding of what the new reasonable expectations of behavior are.
----------------------------------------You're a better, braver man than I Sek! ![]() I've had enough for me. ![]() |
||
|
|
![]() |