Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 144
|
![]() |
Author |
|
Speedy51
Veteran Cruncher New Zealand Joined: Nov 4, 2005 Post Count: 1293 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Speedy51, pls read again my original input. yes I have reread your original post. I was purely pointing out what I said in my above post ![]() |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7670 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The last "Waiting to be sent" I had from April 28 was finally sent on August 29. It was crunched and became valid on August 30.
----------------------------------------MCM1_0216199_5325_1 Linux 3.13.0-143-generic Valid 2024-08-29 20:12:47 UTC 2024-08-30 05:47:08 UTC 3.78 / 3.78 75.1 / 75.3 MCM1_0216199_5325_2 Linux Linuxmint LMDE 4 (debbie) [4.19.0-8-amd64|libc 2.28 (Debian GLIBC 2.28-10)] Valid 2024-04-28 09:35:11 UTC 2024-04-28 17:48:45 UTC 3.12 / 3.12 Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 971 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The last "Waiting to be sent" I had from April 28 was finally sent on August 29. It was crunched and became valid on August 30. Finally!!!MCM1_0216199_5325_1 Linux 3.13.0-143-generic Valid 2024-08-29 20:12:47 UTC 2024-08-30 05:47:08 UTC 3.78 / 3.78 75.1 / 75.3 MCM1_0216199_5325_2 Linux Linuxmint LMDE 4 (debbie) [4.19.0-8-amd64|libc 2.28 (Debian GLIBC 2.28-10)] Valid 2024-04-28 09:35:11 UTC 2024-04-28 17:48:45 UTC 3.12 / 3.12 Cheers I find it interesting that it seems to have needed that outage on the 29th to sort this out... It looks as if the stalled task was picked up almost as soon as contact with the scheduler resumed :-) -- coincidence??? It would be interesting to know why the system was off for those 5 hours, but I doubt we'll ever find out... Cheers - Al. |
||
|
cz50975
Advanced Cruncher Joined: Dec 9, 2004 Post Count: 95 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
There is problem with some MCM validators from approximately 2024-11-17 14:51:31 UTC
----------------------------------------Here is example https://www.worldcommunitygrid.org/contribution/workunit/628669472 MCM1_0227764_9064_0 Pending Validation 2024-11-12 17:40:16 UTC 2024-11-16 20:38:46 UTC 4.84 / 4.86 83.1 / 0 MCM1_0227764_9064_1 Pending Validation 2024-11-12 17:38:32 UTC 2024-11-17 14:51:31 UTC 1.89 / 2.12 84.3 / 0 Some WUs can pass validation in seconds other waiting hours like example above for more than 15 hours. Together with "Waiting to be sent" error rapidly increased my validation queue. ==================================== WU finally validated. [Edit 1 times, last edit by cz50975 at Nov 18, 2024 1:59:30 PM] |
||
|
cz50975
Advanced Cruncher Joined: Dec 9, 2004 Post Count: 95 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Still have bulk of MCM WUs received in DEC 2024, returned early JAN 2025, affected by "returned in 2018" problem and now waiting in validation queue because of "Waiting to be sent" issue.
here are examples from different devices https://www.worldcommunitygrid.org/contribution/workunit/642156202 https://www.worldcommunitygrid.org/contribution/workunit/642770207 https://www.worldcommunitygrid.org/contribution/workunit/642003685 https://www.worldcommunitygrid.org/contribution/workunit/643057568 |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2167 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Still have bulk of MCM WUs received in DEC 2024, returned early JAN 2025, affected by "returned in 2018" problem and now waiting in validation queue because of "Waiting to be sent" issue. Lots of "Waiting to be sent" here, too: 197 tasks from December 2024; however, none "returned in 2018". All of these 197 tasks have exactly one wingman that didn't return their task in time. Adri |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 971 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Still have bulk of MCM WUs received in DEC 2024, returned early JAN 2025, affected by "returned in 2018" problem and now waiting in validation queue because of "Waiting to be sent" issue. Lots of "Waiting to be sent" here, too: 197 tasks from December 2024; however, none "returned in 2018". All of these 197 tasks have exactly one wingman that didn't return their task in time. Adri A build-up can happen without there having been some sort of system outage; that seems to happen regularly based on the interval (deadline related?) since the previous blockage cleared. The algorithm they are using to decide on new work control seems to end up putting out enough work that some of the missed deadline tasks collected after the previous unblocking end up with delayed retries anyway (if enough miss deadline together, as often seems to happen), so another blockage starts... For what it's worth, I'm getting a [very small] number of retries to process, but most of them were more or less instant turn-around cases and the most delayed one this time round was held up for just over a day -- not really surprising, as the older the work unit, the longer it seems to have to wait to get a retry :-( Ah, well, either something they've already built into their automation will clear some of it out, or their Tech Team person will eventually get a chance to look into it. Cheers - Al. P.S. When I raised this in our "Project status" thread earlier in the week I also mentioned the 1-person Tech Team and the likelihood that he has multiple equally urgent tasks pending -- though it may not make us very happy, I think we need to bear that in mind :-) [Edited the P.S. (regarding tasks)] [Edit 1 times, last edit by alanb1951 at Jan 20, 2025 4:20:28 PM] |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7670 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Yes, quite the buildup of "Pending Validation." My oldest one I returned on Dec 2, 2024. There was a "No Reply" workunit and the third one is in the "Waiting to be sent " status.
----------------------------------------Cheers
Sgt. Joe
----------------------------------------*Minnesota Crunchers* [Edit 1 times, last edit by Sgt.Joe at Jan 20, 2025 10:25:59 PM] |
||
|
cz50975
Advanced Cruncher Joined: Dec 9, 2004 Post Count: 95 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
There is again problem with MCM validations. Some WUs can pass validation in seconds other waiting hours like example below for more than 20 hours.
----------------------------------------https://www.worldcommunitygrid.org/contribution/workunit/655985720 [Edit 1 times, last edit by cz50975 at Jan 31, 2025 12:50:00 PM] |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 971 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It isn't really surprising that the validators might struggle when very high numbers of results returned are retries or second results (as will be the case when there's a mass clear-out of delayed retries, no matter how that is achieved...)
The problem could be resolved by temporarily restarting the validators but running twice as many, each one looking at a smaller subset of work-units! However, the long-term solution would be to avoid the build-up of waiting retries in the first place. I don't know whether their existing control over the amount of new work generated actually looks at the number of retries waiting as well as the amount of work currently out in the field - if it doesn't, it probably should [if possible] :-) However, there are two other aspects that are beyond the control of WCG; one is things like hardware outages (planned or otherwise), the other is the large number of results that come back No Reply or get killed off by the client (Not started by deadline) because users have accumulated too much work for one reason or another[*1]. In this case, we are still seeing the knock-on effects of the data centre outage :-( but some delayed retries are also coming back No Reply... It would be nice to see the work cycle for MCM1 settle down, but (given the number of other calls on the time of the Tech Team member) I suspect we may be stuck with these cyclic events until MAM1 comes on stream (which, with luck, will somewhat reduce the general need for new MCM1 tasks!). Cheers - Al. *1 -- there are likely to be late returns for reasons other than over-large buffers; all it needs is a serious mis-estimate of expected run time for some project (here or elsewhere) to fill up smaller buffers with unneeded work (ask Einstein@home and MilkyWay@home how I know that one!) |
||
|
|
![]() |