Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 7
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1674 times and has 6 replies Next Thread
IBM01902
Cruncher
Joined: Aug 13, 2017
Post Count: 11
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Centrino Duo, Remaining time runs backwards

I have two old computers that are Centrino duo running linux mint 17. One was originally Windows VIsta and runs fine. The other is older and ran XP but now runs 17 also.
The older one, which ran fine for a while has started an odd behavior. Mapping Cancer markers are estimated to take approx 6 hours, The remaining time counts down to approximately 2.5 hours and then turns around and counts back up, but at a slow rate, about 1 second for every 5 or 10. I've tried reseting the project, even unloading and reloading BOINC. I've let tasks run a few hours over the estimate, but haven't yet just let it run to see what might really happen. Not sure if this is an error that is causing retry's or where to look.
[Nov 9, 2018 2:52:45 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Centrino Duo, Remaining time runs backwards

MCM doesn't checkpoint very often, and it's only at checkpoints that end-times are re-estimated based on time taken and % done. Between times, an extrapolation is performed. Based on my observations, not a reading of the code, it seems to me that initially the extrapolation assumes that the end-time is correct, but as time goes by it changes towards assuming the % complete is correct. So, for long-running WUs whose initial estimate is too short, you will see time running down normally at first, then slower, and finally backwards -- then a big jump at the next checkpoint.

If you ask for WU properties in BOINC Manager it will tell you when the last checkpoint was and what %/min it estimates progress to be. You might be able to work our what's happening then. But if you can also see that the task is progressing, albeit slowly, I'd let it run. It's not that uncommon for estimates to be out by several multiples, especially initially.

Good luck
[Nov 9, 2018 9:38:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
IBM01902
Cruncher
Joined: Aug 13, 2017
Post Count: 11
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Centrino Duo, Remaining time runs backwards

You mentioned the checkpoint process. I checked Boinc's config vs the other computer and the slow computer was set to checkpoint every 30 seconds vs every 30 minutes. I set that to 30 minutes to make them consistent. Still the job was slow and running well past estimate. I let it run and noticed Percent Complete was at least incrementing, even though time was running backwards. You are correct, eventually the time hit a peak and started turning back around, although a second definately was more of a random tick size. Anyhow, the job completed 4 - 5 times longer than the estimate. I'll just have to learn to use the % Complete and have faith it's doing something useful, not just generating heat. The next two tasks came in with an estimate that's 2x the previous tasks so maybe it's learning something or the job is just that much bigger. Thanks for your help and restoring the faith in this old computer.
[Nov 11, 2018 2:32:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Centrino Duo, Remaining time runs backwards

Yup, it's learning! So next time the progress will more closely reflect reality, and it will keep getting better (within the natural variability of the tasks), but the long checkpoint time still causes some slightly odd behaviour from time to time.

I have to say that 30m is quite a long minimum checkpoint time, unless you're running your machine 24/7 or suspending it rather than shutting it down. Don't forget that if it is shut down, either intentionally or because of a failure of some sort, you will lose an average 15min for each task running. The checkpoint overhead is not large. I have mine set longer than the default, but not as high as you -- 5min -- but I have 8 threads.

I'm glad you stuck with it. Just bear in mind how much the leccy is costing you though ...
[Nov 11, 2018 9:15:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
IBM01902
Cruncher
Joined: Aug 13, 2017
Post Count: 11
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Centrino Duo, Remaining time runs backwards

So this old computer is sicker than I thought. I ran BOINCs benchmarks when the computer first starts up and get around 2000 MFlops on the first test. I let the crunching go on for only a couple of minutes and the CPU must be going into self preservation mode. A subsequent test knocks out only 400 MFlops. That's why the work units are taking so long compared to the estimate. So, I took the thing apart and have the little fan blowing directly on the cpu rather than the convoluted copper tube heat sink system and it maintains the 2000 MFlops. I've ordered some heat sink grease, we'll see if that's enough, the old stuff has gotten cakey. The poor performance is despite a working fan and no dust, so it must be a contact problem or it's time to put this one in the recycle bin.
[Nov 19, 2018 2:46:35 AM]   Link   Report threatening or abusive post: please login first  Go to top 
IBM01902
Cruncher
Joined: Aug 13, 2017
Post Count: 11
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Centrino Duo, Remaining time runs backwards

A Little more searching. The copper tube is called a heat pipe. Apparently they're filled and sealed with a small amount of water which gasifies with CPU heat and conducts over to the fins. Some eventually dry out and that must be this one. So, it will carry on with it's keyboard removed and the little fan blowing directly on the cpu area. Speeds are normal this way and I have a usb keyboard.
[Dec 1, 2018 4:14:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Centrino Duo, Remaining time runs backwards

Glad you found the culprit. Laptops are difficult to cool but if used as a dedicated cruncher, it should be fine to keep apart. I have an old dual core pentium laptop I dragged out to try to reach a goal, and I leave the bottom panel off to give more air to the cpu.
[Dec 1, 2018 4:37:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread