Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Thread Type: Sticky Thread Total posts in this thread: 427
|
![]() |
Author |
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 986 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have been running a 0.6 day cache for years... By using WCG profiles to control how many of each task type I get sent, I can maintain a work load that doesn't quite fill that cache, so when a few units finish they get replenished quite quickly; indeed, when others were complaining about no work my MCM1 caches were all either full or just one or two tasks short... The same applied to OPN1/G and ARP1 when work [other than retries] was actually available.
I still get the odd Server Aborted, but (as pointed out) that's one of the penalties for getting lots of retries... And, of course, it shouldn't abort a user task if it has passed the first checkpoint unless the work unit is known to be bad, so Server Aborted is only bad news if there's a long time between checkpoints - waste of a download, yes; waste of CPU, probably not... The only BOINC project I participate in that has very long running work units (CPDN) seems to send them out based on the number of available "cores" rather than buffer capacity, so I don't have problems with that :-) Seems that there are some who feel the need to have a 6-day cache. Why? I have no clue. I can see a day or two, but not a week. I suspect a lot of those users happened to make their first contact with a BOINC project that had high default cache configurations (back in the day, 10+10!...), and that then propagates to their preferences on other projects -- if they don''t realize it's a problem they won't do anything about it (more on this later..)Also, I wonder what the Science United user default setup is at WCG - not every user will know that they can change caches and other settings via the client, so if those are high... At the 6:00:00-day mark, the system does an automatic RESEND on that WU. Before the pre-hiatus workload clear-out, most WCG projects used the BOINC grace-day option to give No Reply jobs an extra day's grace before sending a retry -- that meant that any "slow" users would either hit "Not Started by Deadline" and self-abort, or would have a bit of extra time to return. That would have stopped most of the Server Aborts I've had (though it's amazing how often I can get my retry sent back before the.late returner sends their result in, especially for OPN1/OPNG!)I'd be tempted to leave ARP1 as it is - I can put up with the odd aborted job in exchange for trying to keep the generations moving on... However, I'd be inclined to shave a day off the 6-day deadlines on other projects and put the grace day back... I don't think any of the other projects (even HST1) need 5+ days to run a single task, and I wonder how many really slow machines are still running here anyway... To anyone running a long queue, please make sure that all your WU’s are well clear of your system before the 6:00:00-day mark! Unfortunately, I suspect significant numbers of the users with large caches are running in "fire and forget" mode and may not realize they are causing any issues (as I mentioend above). Such users are highly unlikely to see anything posted on here :-( And there will also be users whose attitude is likely to be "so what -- I never run out of work and that's what matters" (but how many of those tasks are actually returned in time and count for anything?)I'm not sure what can be done about it in practical terms -- that's one of the reasons I'd quite like to see the grace period back for most WCG projects... Cheers - Al. |
||
|
hchc
Veteran Cruncher USA Joined: Aug 15, 2006 Post Count: 812 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
At least anecdotally, I initially put 1 day on my quad core machine, and that wasn't enough to stay fed (especially over that 3-4 day dry spell). I bumped it to 3 days like my slower machine, but the BOINC calculation said "job cache full" when it had less than a day's worth of work, which was odd. So now at the 5 day setting, it has about 2 days' worth of MCM. I tend to babysit my BOINC boxes more than I probably should, but it's fun for me. I definitely return all work units within 1-3 days they are issued.
----------------------------------------If/when things get better, I'll lower things back to 1-2 day caches. [Edit: Just changed the quad core machine back to 3 days and will see how that goes again.] In the past, I've only gotten "greedy" if I was super close to the next badge and the project was ending or going on hiatus e.g. Zika or Ebola or SCC pause.
[Edit 1 times, last edit by hchc at Dec 28, 2022 1:07:11 PM] |
||
|
Blount
Senior Cruncher Joined: Aug 19, 2005 Post Count: 474 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I run with a 2.0 cache and have for years. Lately I have 4 hours or less of tasks waiting. In the recent past I would have a cache of almost 2 days. Something changed and the cache and active jobs are seeing lots of dry spells.
MCM tasks run in 1hr or less on the AMD 7950X, 1:4hr-ish on the AMD 5950X and 1:42 plus on the AMD 3950x. Larger deltas when running ARP. If running over 8 ARPs on a machine the run time extends significantly. I limit the machines to 4 or 8 ARPs. |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7697 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I am really satisfied running the 3 day cache at the moment. It seems to keep my machines well fed without overdoing it. Almost all of the "server aborts" have no cpu time associated with them, so they did use some bandwidth on the download, but that is minimal. Once the supply has stabilized for a while I will cut back to a 1.5 day cache. That always seemed to work fairly well.
----------------------------------------Just as a side note, with the MCM units the LOO type run much quicker on the Linux systems and NFCV type run quicker on the Windows systems. On the Windows I7-3770 the NFCV type run about 1.75 hours and the LOO type run 2.75 to 3 hours. Cheers
Sgt. Joe
----------------------------------------*Minnesota Crunchers* [Edit 1 times, last edit by Sgt.Joe at Dec 28, 2022 5:27:20 PM] |
||
|
bfmorse
Senior Cruncher US Joined: Jul 26, 2009 Post Count: 303 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
You have me at a loss. Please explain what you mean about the types:
Just as a side note, with the MCM units the LOO type run much quicker on the Linux systems and NFCV type run quicker on the Windows systems. On the Windows I7-3770 the NFCV type run about 1.75 hours and the LOO type run 2.75 to 3 hours. |
||
|
nivrip
Senior Cruncher North Yorkshire Joined: Sep 13, 2007 Post Count: 264 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
You have me at a loss. Please explain what you mean about the types: Just as a side note, with the MCM units the LOO type run much quicker on the Linux systems and NFCV type run quicker on the Windows systems. On the Windows I7-3770 the NFCV type run about 1.75 hours and the LOO type run 2.75 to 3 hours. Yes, I need some education on this topic too.
ЮРКШИР КРУНЧЕР
|
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7697 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This is from the results log of a completed work unit. The line VMethod = NFCV specifies the type to which I was referring.
----------------------------------------<core_client_version>7.6.31</core_client_version> <![CDATA[ <stderr_txt> Commandline = ../../projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu -SettingsFile MCM1_0193802_5599.txt -DatabaseFile dataset-sarc1.txt Settings File DateOfDesign = 20200218 Designer = Krembil/cubes WorkOrderID = 0193802_5599 DatasetID = sarc1 RSeed = 334745600 StartingGeneSignatureAlgorithm = randomFixedLengthSearch RunPermutationAlgorithm = 0 FitnessFn = 0 NumberOfGenesInStartingSignature = 20 NumberOfGenesInSignatureMin = 20 NumberOfGenesInSignatureMax = 20 SearchAlgorithmNumberToCreate = 12071 MinFitness = 0.497 VMethod = NFCV NFolds = 20 SvmArgs = "-v 0 -t 0 -c 1000" SvmLearnLimit = 250000 Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
goben_2003
Advanced Cruncher Joined: Jun 16, 2006 Post Count: 146 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The issues we are aware of but remain unresolved:
Hi Cyclops, They are not showing up within 24 hours. I added a new device which returned its first result ~84 hours ago (3.5 days). Is there an update when this will be fixed? I saw you say it was marked as a high priority bug. When will this high priority bug be fixed? If it was 24 hours it would be understandable as a temporary work around until the real fix is in. I waited (patiently?) until 3.5 days to come look at the forums and see what the problem was. To you and bfmorse: from what the tech team was able to find, it seems like the devices were registered in the BOINC database but excluded from the website database. Many devices that were missing have been synchronized between the two using the procedure provided by IBM. However, there are still issues with displaying information about those devices that were synchronized between BOINC and DB2 as you identified, and also w.r.t. statistics in My Contribution or listings in My Contribution. The tech team will be working to fix this issue and have classified it as a high priority bug. In the meantime, I will add this issue to the Comprehensive Bug List. Any updates will be shared on that thread as well as this one. ![]() |
||
|
Igelwurst
Cruncher Germany Joined: Jun 29, 2015 Post Count: 23 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi,
----------------------------------------I have the same issue... the new device has been working for about 2 weeks and is not displayed in the device list. br Igelwurst ![]() |
||
|
bonami2
Cruncher Joined: Nov 8, 2021 Post Count: 1 Status: Offline Project Badges: ![]() ![]() |
Being running for more than 2 week and got no statistic update since the last work i did like 2 years ago. Result are uploading. But no device or stat update.
----------------------------------------Going back to folding@home [Edit 1 times, last edit by bonami2 at Feb 7, 2023 1:07:02 PM] |
||
|
|
![]() |