Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 137
|
![]() |
Author |
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Lots of FAH2 tasks have downloading again recently ![]() And now another issue appears. Several of my machines set with only a 1/2 day cache downloaded over 100 tasks each. At 12 hours per task times 8 cores times a 4 day deadline = 64 tasks at best finishing within the deadline time. The rest? Makes no sense to me. ![]() ADMINS how much time is needed be4 some machine is deleted from a "unreliable device"? much of our devices will be listed as unreliable, 'cause of 2 much work downloaded... ![]() I believe it is set to 9 successful results in a row on a host against a specific app version. Thanks, -Uplinger |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1320 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
From a quick look at results returned, less than 5% of the results are not returning 100k steps. Exceptions prove the rule - 7 out of 56 tasks on my machine have the wcgfahb000X0000 addition (12.5%) One original task did not made it further than the first trickle. The 'to contnue' task I got: FAH2_avx17257-ls_000053_0003_001_wcgfahb00010000_0 |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Is there a reliable rule applying for these(?)... then you would see a larger percent. Decided that 'however WCG wants it' is fine by me, long as what is crunched is valid. [Certainly 95% will only enforce the idea of 'we're fine, no need to go the extra length to get to 97-98], e.g. a report of some crunched whole units offline without trickling in first 72 hours on clock, then on report 'invalid', whilst those that were trickling before going offline and completed during that time, were fully valid [probably because the server were waiting on the soft stop feedback from the client].
|
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
On those who are not likely making it in time, ran the API extract for In Progress several times and noted that 3 of total cached 58 tasks do not have the ServerState = 4 indicator. 2 FAHB and 1 CEP2. No 5, no 0, nothing. Others with trickle show ServerState 4:
fahb 3113135 3113135 1444292543 FAH2_avx101122_000068_0027_003_0 11-10-2015 1:55 7-10-2015 1:55 4 0 6,81Oh, and noticed my remote when TeamViewing in, has 17 FAHB, 4 of which are with the wcgfahbnnnn sub, 23.5% ![]() |
||
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Lots of FAH2 tasks have downloading again recently ![]() And now another issue appears. Several of my machines set with only a 1/2 day cache downloaded over 100 tasks each. At 12 hours per task times 8 cores times a 4 day deadline = 64 tasks at best finishing within the deadline time. The rest? Makes no sense to me. ![]() Now I have 4 machines set with a 1/2 day cache with over 120 tasks downloaded. Can one of the techs come up with an explanation for this? I'm setting them all to no new tasks until this issue can be resolved.
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
![]() ![]() |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Those values with wcgfahb....are higher than what I saw in one batch. Like I said, it was a quick scan and it was only of returned work.
Note, the server is not doing soft stops at the moment. It is only issuing hard stops for production work units at the moment. The reason your result could have gone invalid was that if you sent back a trickle message but did not upload the intermediate upload files within 3 hours. At this time, you would get a hard stop and the result would have been marked for validation on the back end. If zero trickle messages were completed, it would mark your result invalid and send another copy to another computer from step 0. Thanks, -Uplinger |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1951 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hello everyone, Well, around here, the opposite seems to happen. Over the last night, I got mostly non-FAH2 WUs, with FAH2 the only project selected. And runtime per day dropped from around 50 CPU days/calendar day to less than 40 by now... The dry spell should be over, I have increased the weight of the project, so we should start to see an up turn in runtime per day on the project as well. ![]() EDIT: I just checked, I none of my approx. 30 active hosts has gotten a single FAH2 WU in the last 9h... Ralf ![]() [Edit 1 times, last edit by TPCBF at Oct 8, 2015 6:30:04 PM] |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Ok, it got stuck again. I have cleared it manually, I have also added some monitoring to make sure I get alerted the next time it happens. They are flowing again as I type.
Thanks, -Uplinger |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1951 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Ok, it got stuck again. I have cleared it manually, I have also added some monitoring to make sure I get alerted the next time it happens. They are flowing again as I type. Thanks! Thanks, -Uplinger I got 14 FAH2 WUs among 11 different hosts by now again. Looks like there is quite a bit of tuning work to do until this projects runs as smoothly as pretty much all the others do (at least since I joined up ![]() Ralf ![]() ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
But yet.... This is an hour and a half after your "cleared it" message.
10/9/2015 12:07:46 PM | World Community Grid | Sending scheduler request: To fetch work. 10/9/2015 12:07:46 PM | World Community Grid | Requesting new tasks for CPU and intel_gpu 10/9/2015 12:07:48 PM | World Community Grid | Scheduler request completed: got 0 new tasks 10/9/2015 12:07:48 PM | World Community Grid | No tasks sent 10/9/2015 12:07:48 PM | World Community Grid | No tasks are available for FightAIDS@Home - Phase 2 10/9/2015 12:07:48 PM | World Community Grid | No tasks are available for Uncovering Genome Mysteries 10/9/2015 12:07:48 PM | World Community Grid | No tasks are available for the applications you have selected. I've been wondering about this. I have a 3 day cache set and only this project selected, but only have 3 extra WUs downloaded (only one extra set, with 3 cores on the laptop running). |
||
|
|
![]() |