Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 118
|
![]() |
Author |
|
thunder7
Senior Cruncher Netherlands Joined: Mar 6, 2013 Post Count: 232 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Still, I'm wondering what algorithm is used to give WU's (or not, as the case may be).
I have 3 linux systems. 2 always have work and full queues, The 3rd often has enough work, with a few WU's queued but will, say twice a week, run out of work for a full night. When I notice that in the morning, I detach and re-attach the project, and suddenly it'll have work again in an hour. This has been happenng for months now, so I'm starting to disbelieve the 'tt's just the luck of the draw' theory for WU dispersal. |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7691 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have noticed something similar with one of my Linux systems. I have come to suspect the system itself encounters some condition - unknown at the moment- and creates some kind of internal blockage. If I initiate a manual update it takes right off and gets more workunits. As far as I can determine it happens randomly. It is a dedicated system which only runs WCG 24/7.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 988 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
For me... I need to manually request now and then. The gap between asking gets longer and longer the more I'm unsuccessful, so the manual request ups the frequency of asks again. I know some people have a script to keep the asks every X minutes, and this helps them consistently get WUs. It might also increase the load on the machine so I don't think all of us should do that.
I find the data that some of your machines do fine, and one has issues interesting. What is different about that machine?? Back in the Seti days it seemed that the slower machines could get WUs easier when they were scarce. It probably was just a perception as few seti WUs could give a slow machine a nice queue, but be done in a blip on a fast machine and not noticed. |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 977 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Unixchick,
----------------------------------------Nice summary of the client behaviour! It's worth noting that when the client reports completed work it is not guaranteed that it will ask for new work -- that can be quite annoying if a user with lots of threads is depending on reports to also be requests. In my view, there is still room for improvement in client behaviour regarding requests :-) Also (regarding "luck of the draw"), I suspect that if lots of simultaneous requests for work go in, smaller requests are more likely to be satisfied (less iterations in the work-issuing loop!) So I suspect that folks who set up for small caches and fast turnaround may be less likely to run out of work, especially if they prod the client every 15, 20 or 30 minutes to wake up the request mechanism. However, large caches and prodding the client every 5 (or less) minutes may be counter-productive! Cheers - Al. P.S. I'd rather return work promptly and run the risk of running out than maintain the sort of cache size that some users seem to do (accounting for the 10 to 15% of my MCM1 tasks that see at least one wingman missing deadline on a typical day...) -- big caches for huge machines, yes (though trying to fetch 1000 tasks at once may be doomed to failure); 5+ day caches for smaller machines, no :-) [Edit 1 times, last edit by alanb1951 at Dec 27, 2023 5:52:45 PM] |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12431 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Yes, there are lots of quirks in the system. And they have mostly been reported many times. But I have been seeing one that I haven't noticed on the fora.
When I get a resend with a 3-day deadline, it doesn't get done in deadline order. It gets done when the equivalent normal units get done. This doesn't mean it misses the deadline because I don't carry a large cache. I would have thought that they would be done in deadline order. Mike |
||
|
Bryn Mawr
Senior Cruncher Joined: Dec 26, 2018 Post Count: 346 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Yes, there are lots of quirks in the system. And they have mostly been reported many times. But I have been seeing one that I haven't noticed on the fora. When I get a resend with a 3-day deadline, it doesn't get done in deadline order. It gets done when the equivalent normal units get done. This doesn't mean it misses the deadline because I don't carry a large cache. I would have thought that they would be done in deadline order. Mike Only when the tasks are put into high priority mode which only happens when they are seen to be in danger of missing their deadline. |
||
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 988 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
I'm still stuck on something that was discussed on Page 3 of this thread... Why have they decreased the amount of WUs going out for MCM since Nov 27??
https://www.worldcommunitygrid.org/stat/viewP...&numRecordsPerPage=90 They used to send out a bunch more WUs for MCM... then SOMETHING happened. The system got wonky, the website crashed a bit more, and I guess they cut way back on the WUs going out to keep the system stable. We used to return 1.3 million in a day... now it is low 800k. No wonder so many of us have empty caches. I worry that they won't be able to bring the other projects back. ARP has always taxed this current configuration. I'm preparing myself now for a bumpy ride. |
||
|
thunder7
Senior Cruncher Netherlands Joined: Mar 6, 2013 Post Count: 232 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
First of all, the hardware is different,
the good one has 2 Xeon 2696 V2 cpu's (48 threads, 4023 Whetstone, 88393 Dhrystone), 384 Gb memory the bad one has 2 Xeon 2696 V4 cpu's (88 threads, 4923 Whetstone, 78313 Dhrystone), 128 Gb memory Both are available for 100% for WCG. The good one consistently runs 48 WU's and has a buffer containint 950-1000 wu's. It runs 350-400 WU's per day: 2023-12-27 0:047:22:19:00 203531 385 2023-12-26 0:044:03:29:52 188538 358 2023-12-25 0:048:10:42:13 202057 390 2023-12-24 0:048:03:43:20 210460 400 2023-12-23 0:045:11:33:05 196557 378 2023-12-22 0:045:11:53:00 193925 366 2023-12-21 0:047:21:24:56 203064 388 The bad one never has more than 200 in the buffer, often only 30-50 and runs out of work frequently. Yesterday for example was a bad day: 2023-12-27 0:040:12:26:34 203124 387 2023-12-26 0:081:17:47:44 358064 711 2023-12-25 0:061:17:06:47 270355 512 2023-12-24 0:079:07:05:42 346041 649 2023-12-23 0:064:18:53:11 283429 545 2023-12-22 0:046:18:23:06 218234 429 2023-12-21 0:074:22:09:40 319908 607 There is no difference in global_prefs.xml or global_prefs_override.xml except for the max_cpus value, of course. Both have the same profile in the device manager (default, runs for maximum output). The only thing I can find is that the good one was, according to the device manager, installed in November 2022, and the bad one was installed in November 2023 (well, it got a new motherboard then). All in all, I don't see a cause for the significant difference here. |
||
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 988 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
Thanks for the info Thunder7. Wish it had given us a clue, but still worth looking at just in case we found a reason.
|
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7691 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thunder 7,
----------------------------------------In the profile you are using, have you set the project limits to "unlimited" ? I have a 32 thread machine with a 2 day cache set that way. It works better than the setting of 64 and does not overload the machine. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
![]() |