World Community Grid - View Thread

World Community Grid Forums

Category: Completed Research

Forum: FightAIDS@Home Phase 2

Thread: Run dry already?

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 137

[ ]

Author

This topic has been viewed 22003 times and has 136 replies

uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:

10 year badge for Human Proteome Folding

2 year badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

20 year badge for Nutritious Rice for the World

2 year badge for The Clean Energy Project

5 year badge for Help Fight Childhood Cancer

2 year badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

10 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

10 year badge for Drug Search for Leishmaniasis

20 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

50 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

100 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

50 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: Run dry already?

Lots of FAH2 tasks have downloading again recently smile

And now another issue appears. Several of my machines set with only a 1/2 day cache downloaded over 100 tasks each. At 12 hours per task times 8 cores times a 4 day deadline = 64 tasks at best finishing within the deadline time. The rest? Makes no sense to me. confused

ADMINS
how much time is needed be4 some machine is deleted from a "unreliable device"?

much of our devices will be listed as unreliable, 'cause of 2 much work downloaded...
confused

I believe it is set to 9 successful results in a row on a host against a specific app version.

Thanks,
-Uplinger

[Oct 8, 2015 2:55:16 AM]

Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1320
Status: Offline
Project Badges:

90 day badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

20 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

20 year badge for FightAIDS@Home - Phase 2

5 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project


Re: Run dry already?

From a quick look at results returned, less than 5% of the results are not returning 100k steps.

Exceptions prove the rule - 7 out of 56 tasks on my machine have the wcgfahb000X0000 addition (12.5%)

One original task did not made it further than the first trickle.
The 'to contnue' task I got: FAH2_avx17257-ls_000053_0003_001_wcgfahb00010000_0

[Oct 8, 2015 7:48:59 AM]

SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline


Re: Run dry already?

Is there a reliable rule applying for these(?)... then you would see a larger percent. Decided that 'however WCG wants it' is fine by me, long as what is crunched is valid. [Certainly 95% will only enforce the idea of 'we're fine, no need to go the extra length to get to 97-98], e.g. a report of some crunched whole units offline without trickling in first 72 hours on clock, then on report 'invalid', whilst those that were trickling before going offline and completed during that time, were fully valid [probably because the server were waiting on the soft stop feedback from the client].

[Oct 8, 2015 8:55:00 AM]

SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline


Re: Run dry already?

On those who are not likely making it in time, ran the API extract for In Progress several times and noted that 3 of total cached 58 tasks do not have the ServerState = 4 indicator. 2 FAHB and 1 CEP2. No 5, no 0, nothing. Others with trickle show ServerState 4:

fahb	3113135	3113135	1444292543	FAH2_avx101122_000068_0027_003_0	11-10-2015 1:55	7-10-2015 1:55	4	0	6,81
fahb	3113135	3113135	1444295348	FAH2_avx101118-ls_000010_0018_005_1	11-10-2015 1:07	7-10-2015 1:07	4	0	8,12
cep2	2372334	2372334	1444153231	E234030_292_S.290.C26H16N10O2S2.UZWXOROETZJITM-UHFFFAOYSA-N.1_s1_14_2	16-10-2015 17:40	6-10-2015 17:40		0	0,00
fahb	2372334	2372334	1444291340	FAH2_avx38781-ls_000085_0001_001_0	9-10-2015 21:01	5-10-2015 21:01		0	2,55
fahb	2372334	2372334	1444294430	FAH2_avx38781-ls_000063_0015_001_0	9-10-2015 20:58	5-10-2015 20:58		0	5,10

Oh, and noticed my remote when TeamViewing in, has 17 FAHB, 4 of which are with the wcgfahbnnnn sub, 23.5% shock

[Oct 8, 2015 9:28:11 AM]

nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

10 year badge for Help Fight Childhood Cancer

5 year badge for Help Cure Muscular Dystrophy - Phase 2

5 year badge for The Clean Energy Project - Phase 2

20 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

5 year badge for Computing for Sustainable Water

20 year badge for Uncovering Genome Mysteries

50 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

20 year badge for Microbiome Immunity Project

20 year badge for OpenPandemics - COVID-19


Re: Run dry already?

Lots of FAH2 tasks have downloading again recently smile

Now I have 4 machines set with a 1/2 day cache with over 120 tasks downloaded. Can one of the techs come up with an explanation for this? I'm setting them all to no new tasks until this issue can be resolved.

----------------------------------------

In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.

[Oct 8, 2015 10:30:10 AM]

uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:


Re: Run dry already?

Those values with wcgfahb....are higher than what I saw in one batch. Like I said, it was a quick scan and it was only of returned work.

Note, the server is not doing soft stops at the moment. It is only issuing hard stops for production work units at the moment.

The reason your result could have gone invalid was that if you sent back a trickle message but did not upload the intermediate upload files within 3 hours. At this time, you would get a hard stop and the result would have been marked for validation on the back end. If zero trickle messages were completed, it would mark your result invalid and send another copy to another computer from step 0.

Thanks,
-Uplinger

[Oct 8, 2015 10:36:36 AM]

TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1951
Status: Offline
Project Badges:

200 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

50 year badge for Smash Childhood Cancer

100 year badge for OpenPandemics - COVID-19


Re: Run dry already?

Hello everyone,

The dry spell should be over, I have increased the weight of the project, so we should start to see an up turn in runtime per day on the project as well.

Well, around here, the opposite seems to happen. Over the last night, I got mostly non-FAH2 WUs, with FAH2 the only project selected. And runtime per day dropped from around 50 CPU days/calendar day to less than 40 by now... confused

EDIT: I just checked, I none of my approx. 30 active hosts has gotten a single FAH2 WU in the last 9h...

Ralf

----------------------------------------

----------------------------------------
[Edit 1 times, last edit by TPCBF at Oct 8, 2015 6:30:04 PM]

[Oct 8, 2015 5:48:59 PM]

uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:


Re: Run dry already?

Ok, it got stuck again. I have cleared it manually, I have also added some monitoring to make sure I get alerted the next time it happens. They are flowing again as I type.

Thanks,
-Uplinger

[Oct 8, 2015 6:30:41 PM]

TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1951
Status: Offline
Project Badges:


Re: Run dry already?

Ok, it got stuck again. I have cleared it manually, I have also added some monitoring to make sure I get alerted the next time it happens. They are flowing again as I type.

Thanks,
-Uplinger

Thanks!
I got 14 FAH2 WUs among 11 different hosts by now again.

Looks like there is quite a bit of tuning work to do until this projects runs as smoothly as pretty much all the others do (at least since I joined up wink

)

Ralf

----------------------------------------

[Oct 8, 2015 8:34:57 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Run dry already?

But yet.... This is an hour and a half after your "cleared it" message.

10/9/2015 12:07:46 PM | World Community Grid | Sending scheduler request: To fetch work.
10/9/2015 12:07:46 PM | World Community Grid | Requesting new tasks for CPU and intel_gpu
10/9/2015 12:07:48 PM | World Community Grid | Scheduler request completed: got 0 new tasks
10/9/2015 12:07:48 PM | World Community Grid | No tasks sent
10/9/2015 12:07:48 PM | World Community Grid | No tasks are available for FightAIDS@Home - Phase 2
10/9/2015 12:07:48 PM | World Community Grid | No tasks are available for Uncovering Genome Mysteries
10/9/2015 12:07:48 PM | World Community Grid | No tasks are available for the applications you have selected.

I've been wondering about this. I have a 3 day cache set and only this project selected, but only have 3 extra WUs downloaded (only one extra set, with 3 cores on the laptop running).

[Oct 9, 2015 8:50:38 PM]

[ ]