Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 125
Posts: 125   Pages: 13   [ Previous Page | 3 4 5 6 7 8 9 10 11 12 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 9382 times and has 124 replies Next Thread
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1294
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No work?

I can confirm that (at least) several of them stalled. Even two days ago, I've deliberately held on to 3 tasks to have them pass their deadlines and their duplicates — or 'retries', maybe the preferable term which I've seen been used by Al — are still Waiting to be sent:

I have 4 SCC's pending validation where the retries are waiting to be sent:
SCC1_0004203_MyoD1-C_38453_2			Waiting to be sent since 2023-08-16 06:34:31 UTC
SCC1_0004200_MyoD1-C_37868_2 Waiting to be sent since 2023-08-16 06:34:31 UTC
SCC1_0004198_MyoD1-C_13309_2 Waiting to be sent since 2023-08-17 10:47:12 UTC
SCC1_0004205_MyoD1-C_74267_2 Waiting to be sent since 2023-08-22 07:09:39 UTC

----------------------------------------

[Aug 25, 2023 9:52:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1664
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No work?

Hi Sgt.Joe,
I may know the reason for so many resent at once.
My two best crunching machines (16 thread each) suffer a severe power outage at my office with a full 3 day buffer each.
Unfortunately for me, the electricity connection has been restored only after 6 days.
All waiting WU's have been cancelled by the WCG servers when the machines restarted last Wednesday evening.
Cheers,
Yves
----------------------------------------
[Aug 25, 2023 1:44:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7236
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: No work?

Hi Sgt.Joe,
I may know the reason for so many resent at once.
My two best crunching machines (16 thread each) suffer a severe power outage at my office with a full 3 day buffer each.
Unfortunately for me, the electricity connection has been restored only after 6 days.
All waiting WU's have been cancelled by the WCG servers when the machines restarted last Wednesday evening.
Cheers,
Yves

Well, maybe so. But now somebody has kick started the SCC feeder and I am back to a full supply again. We will just have to see if the supply will hold out for the weekend.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Aug 26, 2023 2:28:23 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 738
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No work?

Hi Sgt.Joe,
I may know the reason for so many resent at once.
My two best crunching machines (16 thread each) suffer a severe power outage at my office with a full 3 day buffer each.
Unfortunately for me, the electricity connection has been restored only after 6 days.
All waiting WU's have been cancelled by the WCG servers when the machines restarted last Wednesday evening.
Cheers,
Yves

Well, maybe so. But now somebody has kick started the SCC feeder and I am back to a full supply again. We will just have to see if the supply will hold out for the weekend.
Cheers

Regarding new work: there appears to be a new target (FLI1-B) -- lowest batch seen so far seems to be 4262 and the highest so far is 4271. (I've seen these boundaries, and so has Adri's periodic sampler script.) Hopefully, that should keep us in full swing over the weekend :-)

Regarding Yves and the power outage - that may well have been a contributory factor but I suspect there were tens of thousands of retries queued at one stage, so I don't think it's all one person's fault :-)

Cheers - Al.
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Aug 26, 2023 5:38:07 AM]
[Aug 26, 2023 5:36:23 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7236
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: No work?

Regarding new work: there appears to be a new target (FLI1-B) -- lowest batch seen so far seems to be 4262 and the highest so far is 4271. (I've seen these boundaries, and so has Adri's periodic sampler script.) Hopefully, that should keep us in full swing over the weekend :-)


This series must have a small target like the "A" series because they are just ripping through, most less than an hour, some as short as 30 minutes.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Aug 26, 2023 1:21:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 738
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No work?

Regarding new work: there appears to be a new target (FLI1-B) -- lowest batch seen so far seems to be 4262 and the highest so far is 4271. (I've seen these boundaries, and so has Adri's periodic sampler script.) Hopefully, that should keep us in full swing over the weekend :-)


This series must have a small target like the "A" series because they are just ripping through, most less than an hour, some as short as 30 minutes.
Cheers

I think most of the difference you're seeing is probably down to the different sizes [and complexity] of the ligands -- smaller ligands seem to be getting out first for FLI1-B (and I think that happened for FLI1-A too...) Run time for a given receptor goes up with ligand size and complexity (though I've not yet had time to try to work out an estimator for given sizes to match the one used to size OPN1/OPNG tasks.) The two FLI1 targets both have the same number of atoms.

If you're interested you can check this fairly easily -- the results log for an SCC1 task should have a line identifying the two files used, each name being followed by a size = n b item, where n is the number of atoms and b the number of branches. Both FLI1 receptor files are size = 899 0...

I looked at data I collected for the recent FLI1 tasks on one of my Ryzens, and the newer FLI1-A tasks all had ligands over 20 atoms [and quite a few had 30 or more atoms], whilst FLI1-B tasks don't seen to be getting into the mid-to-high 20s yet; I dug back far enough to find FLI1-A tasks with similarly small ligands and those tasks tended to take about the same time as similarly sized FLI1-B ones.

With a mix of FLI1-B and the appearance of yet more FLI1-A, I think the run times are likely to be all over the place for a while :-)

Cheers - Al.

P.S. MyoD1-C receptor has 1268 atoms (about 1.4 times as large) and tasks for a given ligand size seem to take nearly twice as long to run (that part of the code seems to be order n-squared...)
[Aug 26, 2023 4:36:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7236
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: No work?

Yes, those MyoD1-C do take longer, so I guessed they were bigger.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Aug 26, 2023 6:20:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 1983
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No work?

Regarding new work: there appears to be a new target (FLI1-B) -- lowest batch seen so far seems to be 4262 and the highest so far is 4271. (I've seen these boundaries, and so has Adri's periodic sampler script.) Hopefully, that should keep us in full swing over the weekend :-)

Al, your inspirational remark about targets and my sampler script directed me indeed to said script where, after applying some small modifications, I found a nice way to inject the accompanying targets, so that each batchnumber is coupled with its target from now on.

See the new result here. (Type <End> to take you to the end of the list.)

Adri
[Aug 27, 2023 3:41:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 735
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No work?

Hi hchc, Sgt.Joe, Taurus Oldbull, and GB033533,

Thanks for bringing the issue to our attention. This has been forwarded to the tech team to investigate and I will provide updates as they become available.

Thanks TigerLily. When you posted this message, I did notice a dozen or so really old work units get sent to other people so the tech team must've kicked a few off, but it still (as of Sunday the 27th) seems to be an issue. Lotta old work units in "waiting to be sent" state.

Just letting you know! Not being impatient, just an update.
----------------------------------------
  • i3-8100 (Coffee Lake, 4C/4T) @ 3.6 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • E5800 (Wolfdale, 2C/2T) @ 3.2 GHz

----------------------------------------
[Edit 1 times, last edit by hchc at Aug 28, 2023 2:26:11 AM]
[Aug 28, 2023 1:11:29 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 738
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No work?

Just a thought, given the large number of different ways feeders can be configured...

It's behaving as if new work has somehow been explicitly given priority to try to avoid the log-jam caused by large numbers of retries :-) -- if so, there's a high chance that most (or all?) long-standing retries won't clear while there's a lot of new work hitting the feeder. (Most retries triggered by the validator to resolve Pending Verification seem to be immune to the problem [at present...])

As hchc noted above, some delayed retries got out a few days ago (at a time when there was precious little new work (around 24th/25th August) -- coincidence? I'd love to know how many SCC1 WUs are stuck waiting for retries (and how long it might take to get them distributed given the "same platform" requirement ), but such data-dives can't be high priority for WCG under the circumstances; I suspect the number might be high enough to surprise many folks...

Probably, the only long-term solution to the stop/start flow problem for SCC1 is to find a way of reducing the number of simultaneous retries in the system -- unfortunately, the best way to do that is to educate users to maintain as small a cache as is necessary for normal running [yeah, right!...] so there's a bigger supply of available work for those who can turn it around quickly, and any site-based alternative (such as shortening the deadline for SCC1 whilst adding a compensatory grace period, or forcibly capping the maximum SCC1 task count) is likely to lead to a barrage of complaints :-(

Cheers - Al.

P.S. If the priority mechanism isn't the cause, there must be an exotic problem in the server code that only seems to bite SCC1 [at present] -- MCM1 retries seem to get out in a timely fashion. Who knows whether it's a configuration choice or a server bug...
[Aug 29, 2023 1:30:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 125   Pages: 13   [ Previous Page | 3 4 5 6 7 8 9 10 11 12 | Next Page ]
[ Jump to Last Post ]
Post new Thread