Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 140
Posts: 140   Pages: 14   [ Previous Page | 5 6 7 8 9 10 11 12 13 14 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 6933 times and has 139 replies
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2158
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

And again, Sahara when it comes to new MCM tasks. Only one resend in over 2 hours.
----------------------------------------
[Edit 1 times, last edit by Grumpy Swede at Mar 26, 2025 7:52:38 PM]
[Mar 26, 2025 7:52:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 948
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

I'm getting a good flow of MCM right now.

The reference number for ARP is rising (70ish right now). I haven't gotten any fresh ARP yet, has anyone?

My wild guess (no proof) is that they have been working on getting the MCM sending queue to allow resends without having to stop the generation of new WUs. This process has been wonky recently. I did get some resends today, but I did not have a long dry spell as in the last few days. What are your wild theories?
[Mar 27, 2025 10:33:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 951
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

I'm getting a good flow of MCM right now.

The reference number for ARP is rising (70ish right now). I haven't gotten any fresh ARP yet, has anyone?
There are a couple of replies to your equivalent query in the ARP1 "New work" thread -- I'd posted there before I saw this -- Tony Ellis and I can account for a whole 9 new WUs between us :-)
My wild guess (no proof) is that they have been working on getting the MCM sending queue to allow resends without having to stop the generation of new WUs. This process has been wonky recently. I did get some resends today, but I did not have a long dry spell as in the last few days. What are your wild theories?
I'd been wondering about that, because until recently the new work droughts seemed to be less frequent...

I'll try to make time to have a closer look at my returned results timelines over the last few weeks to look at tasks with retries (either processed or awaited) -- the old pattern was that after a certain amount of work had gone out, it seemed that missed deadline tasks would start to back up (suggesting that tasks were possibly going out in descending WU order?) until action was taken, but more recently there don't seem to be many retries held up but a [complete] lack of new work while they're being dealt with has become a more frequent occurrence.

Unless there are huge numbers of retries waiting, simply sifting in ascending WU order wouldn't explain the lack of new work, whereas turning off some part of the path between creating the data files for a WU and letting the database provide it to the feeder would, so (as per your comment) I have also tended to assume that they'd been turning something off (which only seemed to push the problem out for another cyclic repeat!)

It will be interesting to see if they can solve this before MAM1 goes live -- as I seem to recall some comments about them having problems quite some time ago when an uncommon host requirement blocked the feeders I don't know that they'll be able to use a simple "ascending WU order" method -- there's supposed to be an option that avoids server cache blockages by using slots for different HR classes, but that had probably been tried already so it presumably wasn't helping!

Cheers - Al.

P.S. I actually hope that if MAM1 takes off well, it'll [slightly?] reduce the volume of MCM1 work out there at any given time which, with luck, might make the sending of missed-deadline retries a lot easier! (Yes, I know - I'm repeating myself; it's a characteristic of old age, I'm told...)

[Edited to slightly rephrase the reference to HR classes and cache blocking.]
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Mar 28, 2025 2:51:33 AM]
[Mar 28, 2025 2:23:28 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 948
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

I'm stuck in the Tasks are committed to other platforms loop again. I have no MCM WUs. I did manage to get 1 ARP WU though, so I feel lucky.
[Mar 28, 2025 6:44:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7659
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

I actually hope that if MAM1 takes off well, it'll [slightly?] reduce the volume of MCM1 work out there at any given time which, with luck, might make the sending of missed-deadline retries a lot easier!


I see the missed deadline work units appear seemingly in batches. It kind of reminds me of the old data processing routines on mainframes involving batch processing. The order and processing of various different kinds of batches was a well orchestrated dance so various subsystems (memory, I/O, cpu usage, etc) did not get overloaded to the detriment of other batches.

Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Mar 29, 2025 12:37:20 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2158
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

I have upped my MCM cache to 1 day, from only 6 hours, in order to avoid running out of MCM tasks. So far, that is working as intended,
[Mar 29, 2025 1:30:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 948
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

I've got a good amount of MCM and ARP WUs. Hopefully we have a good weekend of nice flows.
[Mar 29, 2025 1:30:15 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2158
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

I've got a good amount of MCM and ARP WUs. Hopefully we have a good weekend of nice flows.
I fear that the Sahara behaviour will come back during the weekend again. For now though, I have enough of MCM tasks, to last for 24 hours.
[Mar 29, 2025 1:34:20 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2158
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

And it's all goodnight again:

30-Mar-2025 08:42:56 [World Community Grid] Scheduler request completed
30-Mar-2025 08:42:56 [World Community Grid] Server can't open database
30-Mar-2025 08:42:56 [World Community Grid] Project requested delay of 3600 seconds

[Mar 30, 2025 6:45:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 951
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

It was still at "can't open database" at around 07:55 UTC but had graduated to "feeder not running" as of about 08:30 UTC. Last time there was a BOINC database issue (beginning of March) it followed the same sequence, and it took them about a couple of hours to get it running again once they'd worked out what had happened. However, as this is a weekend, I'm not sure whether they'll be starting in on a fix quite as promptly.

At least uploads still seem to be working at present. There'll be some scheduler fun when it comes back and everyone piles on to try to report the tasks and get new work :-(

Cheers - Al.

P.S. I feel for the tech team -- one of my pre-retirement hats was as an O/S and database tech, and I developed an intense dislike for crashes that happened overnight Saturday-to-Sunday...
[Mar 30, 2025 9:27:38 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 140   Pages: 14   [ Previous Page | 5 6 7 8 9 10 11 12 13 14 | Next Page ]
[ Jump to Last Post ]
Post new Thread