Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 88
|
![]() |
Author |
|
bfmorse
Senior Cruncher US Joined: Jul 26, 2009 Post Count: 296 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
My VALID, MCM WU's that are still vislble:
My oldest VALID WU has a return time of "2023-10-24 01:13:34 UTC". I still show 35326 items in the current listing down from the initial 36038 items from my posting on Nov 10, 2023 2:23:17 PM. New WU's are still slow to me. So, I have not yet re-energized those systems previously powered off. My queue is typically set for zero additional days as i tend to get a lot (relatively speaking) of resends. |
||
|
Speedy51
Veteran Cruncher New Zealand Joined: Nov 4, 2005 Post Count: 1288 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have over 1700 valid results but I don't see any issue with this. I process about 23 an hour
----------------------------------------![]() |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2155 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have over 1700 valid results but I don't see any issue with this. I process about 23 an hour So, Speedy51, when you open your Results page and you go to the last page (in your case that would be 1700 divided by 25 - if you have 25 items per page - which equals 68) of all your selected Valid results, you would see nothing out of the ordinary ![]() My oldest Valid result (currently from page 669) is: Result name Status Sent time Due / Return time CPUtime/Elapsed Claimed/Granted Adri [Edit 1 times, last edit by adriverhoef at Nov 14, 2023 12:47:45 PM] |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12359 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
My oldest is:
MCM1_0206667_9530_1 Mike-PC3 Valid 2023-10-23 23:00:24 UTC 2023-10-29 23:00:24 UTC 2023-10-24 06:21:49 UTC 2.35 / 2.36 61.1 / 67.6 Same date as Adri but a bit earlier in the day. Mike |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Further to Adri and Mike's observations:
My oldest is MCM1_0206667_9817_0 (WU 407998880) -- it was returned on 2023-10-24 at 02:42:57 (UTC) If the data in the ModTime field returned by the old API can be believed, it was validated at 02:43:16 the same day. I see no sign that assimilation is happening at present, so there's an associated worry. According to the Global Statistics History MCM1 is seeing over a million returned (and credited) results a day at present, so that's over 500,000 workunits a day. With 20 or more days where almost no WUs have been assimilated and purged, that's an enormous backlog to clear! A standard assimilator does up to 1000 WUs at a time (but how long that might take is an unknown)... Given how far it got clearing out stuff a few days ago, I suspect it might take a week or more of uninterrupted running! I leave any further calculations regarding this as an exercise for the reader :-) Cheers - Al |
||
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 949 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() |
The best we can do is to keep screaming on the boards that this is an issue. I've seen this on numerous BOINC sites. When the database gets too large, it all just grinds to a halt.
I take comfort in that there is a group of us seeing the same thing. I do like to see the data. The next piece of data is how long can it run like this before it stops?? |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Unixchick ,
----------------------------------------The best we can do is to keep screaming on the boards that this is an issue. I've seen this on numerous BOINC sites. When the database gets too large, it all just grinds to a halt. Yup -- that's one of the possibilities I considered as "requires drastic action" in my post in the Chat Room thread on 14th November. And it is having an effect -- collecting or refereshing statistics that come direct from the BOINC database can be quite sluggish (whether via API or web pages...)I take comfort in that there is a group of us seeing the same thing. I do like to see the data. The next piece of data is how long can it run like this before it stops?? The "Statistics History" for MCM1 reports nearly 30 million results returned since 24th October (inclusive of that day); that suggests somewhere in the vicinity of 15 million WUs stuck waiting for assimilation! That's a lot of items to scan looking for WUs ready for assimilation[*1].Unfortunately, the solution is likely to involve either shutting off the work-flow to and from users (to avoid further build-up) or effectively taking the whole service off-line for long enough to do something (such as multiple MCM1 assimilators if not already in use?) to clear some of the backlog[*2] with (hopefully) reduced stress on the database. For all we know, the techs may already be juggling various "scheduler" processes in an effort to sort things out, but there doesn't seem to be much progress :-( Cheers - Al. *1 The much smaller number of OPNG tasks live on the database at any time potentially makes for a much quicker scan -- it appears that OPNG tasks still clear within a couple of days of validation, so the database can still cope at present if circumstances are favourable! *2 However, imagine the outcry from a fair proportion of the user base if anything interrupts work flow, even if it's an essential action! (Totally unlike the ARP1 situation, where it's obvious that the blame should not be laid on WCG...) [Edited regarding sluggish statistical data delivery, and on "blame"] [Edit 2 times, last edit by alanb1951 at Nov 15, 2023 6:50:49 PM] |
||
|
TigerLily
Senior Cruncher Joined: May 26, 2023 Post Count: 280 Status: Offline Project Badges: ![]() |
As many of you have noticed, we are having an issue with purging MCM1 workunits and recently we applied a fix that we believed would also address the symptom, dead MCM1 assimilators that would quit again once restarted. We are investigating why the MCM1 assimilator services are failing again now, and will update everyone when we are able to bring the MCM1 assimilators back up in a useful state.
Regarding the results backlog, we should be able to cope with it once we can get the MCM1 assimilators back up and running. |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
TigerLily,
----------------------------------------Thanks for the confirmation of the issue and the efforts to resolve it. I have visions of many megabytes of assimilator log files and the accompanying headaches... Cheers - Al. [Edit 2 times, last edit by alanb1951 at Nov 15, 2023 10:28:27 PM] |
||
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 949 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() |
Thank you TigerLily. It always makes me feel better to know that the problem is acknowledged and recognized.
|
||
|
|
![]() |