Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 137
Posts: 137   Pages: 14   [ Previous Page | 2 3 4 5 6 7 8 9 10 11 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 22003 times and has 136 replies Next Thread
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Run dry already?

Lots of FAH2 tasks have downloading again recently smile


And now another issue appears. Several of my machines set with only a 1/2 day cache downloaded over 100 tasks each. At 12 hours per task times 8 cores times a 4 day deadline = 64 tasks at best finishing within the deadline time. The rest? Makes no sense to me. confused


ADMINS
how much time is needed be4 some machine is deleted from a "unreliable device"?

much of our devices will be listed as unreliable, 'cause of 2 much work downloaded...
confused


I believe it is set to 9 successful results in a row on a host against a specific app version.

Thanks,
-Uplinger
[Oct 8, 2015 2:55:16 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1320
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Run dry already?

From a quick look at results returned, less than 5% of the results are not returning 100k steps.

Exceptions prove the rule - 7 out of 56 tasks on my machine have the wcgfahb000X0000 addition (12.5%)

One original task did not made it further than the first trickle.
The 'to contnue' task I got: FAH2_avx17257-ls_000053_0003_001_wcgfahb00010000_0
[Oct 8, 2015 7:48:59 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Run dry already?

Is there a reliable rule applying for these(?)... then you would see a larger percent. Decided that 'however WCG wants it' is fine by me, long as what is crunched is valid. [Certainly 95% will only enforce the idea of 'we're fine, no need to go the extra length to get to 97-98], e.g. a report of some crunched whole units offline without trickling in first 72 hours on clock, then on report 'invalid', whilst those that were trickling before going offline and completed during that time, were fully valid [probably because the server were waiting on the soft stop feedback from the client].
[Oct 8, 2015 8:55:00 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Run dry already?

On those who are not likely making it in time, ran the API extract for In Progress several times and noted that 3 of total cached 58 tasks do not have the ServerState = 4 indicator. 2 FAHB and 1 CEP2. No 5, no 0, nothing. Others with trickle show ServerState 4:
fahb	3113135	3113135	1444292543	FAH2_avx101122_000068_0027_003_0	11-10-2015 1:55	7-10-2015 1:55	4	0	6,81
fahb 3113135 3113135 1444295348 FAH2_avx101118-ls_000010_0018_005_1 11-10-2015 1:07 7-10-2015 1:07 4 0 8,12
cep2 2372334 2372334 1444153231 E234030_292_S.290.C26H16N10O2S2.UZWXOROETZJITM-UHFFFAOYSA-N.1_s1_14_2 16-10-2015 17:40 6-10-2015 17:40 0 0,00
fahb 2372334 2372334 1444291340 FAH2_avx38781-ls_000085_0001_001_0 9-10-2015 21:01 5-10-2015 21:01 0 2,55
fahb 2372334 2372334 1444294430 FAH2_avx38781-ls_000063_0015_001_0 9-10-2015 20:58 5-10-2015 20:58 0 5,10
Oh, and noticed my remote when TeamViewing in, has 17 FAHB, 4 of which are with the wcgfahbnnnn sub, 23.5% shock
[Oct 8, 2015 9:28:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Run dry already?

Lots of FAH2 tasks have downloading again recently smile


And now another issue appears. Several of my machines set with only a 1/2 day cache downloaded over 100 tasks each. At 12 hours per task times 8 cores times a 4 day deadline = 64 tasks at best finishing within the deadline time. The rest? Makes no sense to me. confused


Now I have 4 machines set with a 1/2 day cache with over 120 tasks downloaded. Can one of the techs come up with an explanation for this? I'm setting them all to no new tasks until this issue can be resolved.
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


[Oct 8, 2015 10:30:10 AM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Run dry already?

Those values with wcgfahb....are higher than what I saw in one batch. Like I said, it was a quick scan and it was only of returned work.

Note, the server is not doing soft stops at the moment. It is only issuing hard stops for production work units at the moment.

The reason your result could have gone invalid was that if you sent back a trickle message but did not upload the intermediate upload files within 3 hours. At this time, you would get a hard stop and the result would have been marked for validation on the back end. If zero trickle messages were completed, it would mark your result invalid and send another copy to another computer from step 0.

Thanks,
-Uplinger
[Oct 8, 2015 10:36:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1951
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Run dry already?

Hello everyone,

The dry spell should be over, I have increased the weight of the project, so we should start to see an up turn in runtime per day on the project as well.
Well, around here, the opposite seems to happen. Over the last night, I got mostly non-FAH2 WUs, with FAH2 the only project selected. And runtime per day dropped from around 50 CPU days/calendar day to less than 40 by now... confused

EDIT: I just checked, I none of my approx. 30 active hosts has gotten a single FAH2 WU in the last 9h...

Ralf
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by TPCBF at Oct 8, 2015 6:30:04 PM]
[Oct 8, 2015 5:48:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Run dry already?

Ok, it got stuck again. I have cleared it manually, I have also added some monitoring to make sure I get alerted the next time it happens. They are flowing again as I type.

Thanks,
-Uplinger
[Oct 8, 2015 6:30:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1951
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Run dry already?

Ok, it got stuck again. I have cleared it manually, I have also added some monitoring to make sure I get alerted the next time it happens. They are flowing again as I type.

Thanks,
-Uplinger
Thanks!
I got 14 FAH2 WUs among 11 different hosts by now again.

Looks like there is quite a bit of tuning work to do until this projects runs as smoothly as pretty much all the others do (at least since I joined up wink )

Ralf cool
----------------------------------------

[Oct 8, 2015 8:34:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Run dry already?

But yet.... This is an hour and a half after your "cleared it" message.

10/9/2015 12:07:46 PM | World Community Grid | Sending scheduler request: To fetch work.
10/9/2015 12:07:46 PM | World Community Grid | Requesting new tasks for CPU and intel_gpu
10/9/2015 12:07:48 PM | World Community Grid | Scheduler request completed: got 0 new tasks
10/9/2015 12:07:48 PM | World Community Grid | No tasks sent
10/9/2015 12:07:48 PM | World Community Grid | No tasks are available for FightAIDS@Home - Phase 2
10/9/2015 12:07:48 PM | World Community Grid | No tasks are available for Uncovering Genome Mysteries
10/9/2015 12:07:48 PM | World Community Grid | No tasks are available for the applications you have selected.


I've been wondering about this. I have a 3 day cache set and only this project selected, but only have 3 extra WUs downloaded (only one extra set, with 3 cores on the laptop running).
[Oct 9, 2015 8:50:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 137   Pages: 14   [ Previous Page | 2 3 4 5 6 7 8 9 10 11 | Next Page ]
[ Jump to Last Post ]
Post new Thread