Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 140
|
![]() |
Author |
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2158 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
And again, Sahara when it comes to new MCM tasks. Only one resend in over 2 hours.
----------------------------------------[Edit 1 times, last edit by Grumpy Swede at Mar 26, 2025 7:52:38 PM] |
||
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 948 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
I'm getting a good flow of MCM right now.
The reference number for ARP is rising (70ish right now). I haven't gotten any fresh ARP yet, has anyone? My wild guess (no proof) is that they have been working on getting the MCM sending queue to allow resends without having to stop the generation of new WUs. This process has been wonky recently. I did get some resends today, but I did not have a long dry spell as in the last few days. What are your wild theories? |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 951 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'm getting a good flow of MCM right now. There are a couple of replies to your equivalent query in the ARP1 "New work" thread -- I'd posted there before I saw this -- Tony Ellis and I can account for a whole 9 new WUs between us :-)The reference number for ARP is rising (70ish right now). I haven't gotten any fresh ARP yet, has anyone? My wild guess (no proof) is that they have been working on getting the MCM sending queue to allow resends without having to stop the generation of new WUs. This process has been wonky recently. I did get some resends today, but I did not have a long dry spell as in the last few days. What are your wild theories? I'd been wondering about that, because until recently the new work droughts seemed to be less frequent...I'll try to make time to have a closer look at my returned results timelines over the last few weeks to look at tasks with retries (either processed or awaited) -- the old pattern was that after a certain amount of work had gone out, it seemed that missed deadline tasks would start to back up (suggesting that tasks were possibly going out in descending WU order?) until action was taken, but more recently there don't seem to be many retries held up but a [complete] lack of new work while they're being dealt with has become a more frequent occurrence. Unless there are huge numbers of retries waiting, simply sifting in ascending WU order wouldn't explain the lack of new work, whereas turning off some part of the path between creating the data files for a WU and letting the database provide it to the feeder would, so (as per your comment) I have also tended to assume that they'd been turning something off (which only seemed to push the problem out for another cyclic repeat!) It will be interesting to see if they can solve this before MAM1 goes live -- as I seem to recall some comments about them having problems quite some time ago when an uncommon host requirement blocked the feeders I don't know that they'll be able to use a simple "ascending WU order" method -- there's supposed to be an option that avoids server cache blockages by using slots for different HR classes, but that had probably been tried already so it presumably wasn't helping! Cheers - Al. P.S. I actually hope that if MAM1 takes off well, it'll [slightly?] reduce the volume of MCM1 work out there at any given time which, with luck, might make the sending of missed-deadline retries a lot easier! (Yes, I know - I'm repeating myself; it's a characteristic of old age, I'm told...) [Edited to slightly rephrase the reference to HR classes and cache blocking.] [Edit 1 times, last edit by alanb1951 at Mar 28, 2025 2:51:33 AM] |
||
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 948 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
I'm stuck in the Tasks are committed to other platforms loop again. I have no MCM WUs. I did manage to get 1 ARP WU though, so I feel lucky.
|
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7659 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I actually hope that if MAM1 takes off well, it'll [slightly?] reduce the volume of MCM1 work out there at any given time which, with luck, might make the sending of missed-deadline retries a lot easier! I see the missed deadline work units appear seemingly in batches. It kind of reminds me of the old data processing routines on mainframes involving batch processing. The order and processing of various different kinds of batches was a well orchestrated dance so various subsystems (memory, I/O, cpu usage, etc) did not get overloaded to the detriment of other batches. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2158 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have upped my MCM cache to 1 day, from only 6 hours, in order to avoid running out of MCM tasks. So far, that is working as intended,
|
||
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 948 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
I've got a good amount of MCM and ARP WUs. Hopefully we have a good weekend of nice flows.
|
||
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2158 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I've got a good amount of MCM and ARP WUs. Hopefully we have a good weekend of nice flows. I fear that the Sahara behaviour will come back during the weekend again. For now though, I have enough of MCM tasks, to last for 24 hours. |
||
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2158 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
And it's all goodnight again:
30-Mar-2025 08:42:56 [World Community Grid] Scheduler request completed |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 951 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It was still at "can't open database" at around 07:55 UTC but had graduated to "feeder not running" as of about 08:30 UTC. Last time there was a BOINC database issue (beginning of March) it followed the same sequence, and it took them about a couple of hours to get it running again once they'd worked out what had happened. However, as this is a weekend, I'm not sure whether they'll be starting in on a fix quite as promptly.
At least uploads still seem to be working at present. There'll be some scheduler fun when it comes back and everyone piles on to try to report the tasks and get new work :-( Cheers - Al. P.S. I feel for the tech team -- one of my pre-retirement hats was as an O/S and database tech, and I developed an intense dislike for crashes that happened overnight Saturday-to-Sunday... |
||
|
|
![]() |