World Community Grid - View Thread - New thread for It's raining Dengue, Hallelujah!...

World Community Grid Forums

Category: Completed Research

Forum: Discovering Dengue Drugs - Together - Phase 2 Forum

Thread: New thread for It's raining Dengue, Hallelujah!...

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 2370

[ ]

Author

This topic has been viewed 1338931 times and has 2369 replies

nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

10 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

5 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

20 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

5 year badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

20 year badge for Uncovering Genome Mysteries

50 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

20 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: New thread for It's raining Dengue, Hallelujah!...

As for hyperthreading imho it will make each unit run slightly longer but you will get more done overall.

HT enabled will also make it run up to 12c hotter.

----------------------------------------

In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.

[Jan 23, 2011 2:21:59 PM]

Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

180 day badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

1 year badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

180 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

5 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project


Re: New thread for It's raining Dengue, Hallelujah!...

Well ... I think I'm on the safe side. The application has a 12 second "redial" limit, one can see this "Communication deferred 00:00:11" when pressing the "update" button in the Boinc manager. I use a "security factor" of 5, my script asks every 60 seconds for updates. Of course I know that this will put some stress on the WCG servers, but ....

Yes, I can use any other number to wait for updates, but it would be nice when SekeRob or any other could give a hint. The boinc application also asks every 60 seconds for work ... at least for the first tries. Then the time between the updates seems to get longer and longer ...and the my crunchers will miss the raindrops from DDDT2. So the question is: how it is done right?
Stephan

Well, SekeRob has already given his "hint", for this part will only add that with 500k results/day, can on average expect 2-3 scheduler-requests/day per computer, while your suggested method is 1440 scheduler-requests/day per computer, or roughly 500x higher load. To handle this higher load, WCG must probably increase the enforced delay from 11 seconds upto 1 hour or something...

Looking on the BOINC-client, with v6.2.xx and earlier clients, the exponential backoff was between 1 minute and 4 hours, and for scheduler-request the backoff was being reset after 10 tries. This is much too agressive, and the BOINC-client is basically DDOS-ing the project-servers after any outages.

v6.6.xx and v6.10.xx did improve things somewhat, with the project-wide backoff on uploads or downloads in case of multiple upload/download-errors, and the new 1-minute to 24 hours deferral on failed work-requests. These changes was still not enough, and for this reason v6.12.xx "soon" to be released includes some more changes. The most notable is that on connection-failures, the exponential deferral is between 10 minutes and 12 hours, while for failed work-request the max starts at 10 minutes. Also, if not mistaken, for work-requests the actual is now between 0.5x max and 2x max...

So, if assumes a computer uses 4 hours/task, and runs v6.12.xx, you'll have:
Dual-core: Upto 4 deferrals after the "'max' 10 minute, double on successive errors"-rule, minimum will be after 5 minutes, 10 minutes, 20 minutes and 40 minutes. After 1 hour, the deferrals is reset, due to a task has finished. This means, BOINC-client will maximum make 5 scheduler-requests per task in this scenario.
Your "trick" on the other hand with 1 scheduler-request per minute, will make... 120 scheduler-requests per task on the dual-core...

Using the same 4 hours/task, depending on #cores, can make this small table:
#cores - Normal - per minute - Excess load
2 - 5 - 120 - 24x
4 - 3 - 60 - 20x
6 - 3 - 40 - 13.3x
8 - 3 - 30 - 10x
12 - 2 - 20 - 10x

So, depending on how many cores your computer(s) has, your "trick" gives an unneccessary increase in server-load between 10x and 24x...

----------------------------------------

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."

----------------------------------------
[Edit 1 times, last edit by Ingleside at Jan 23, 2011 3:20:26 PM]

[Jan 23, 2011 3:17:46 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: New thread for It's raining Dengue, Hallelujah!...

I live in a loft (not super big) so am using laptops so the place doesn't always have to look like a machine room (when entertaining the few non-nerdy friends I have tongue

). Also, I was under the impression that laptops would have a better points/watt-hr ratio (e.g. see this study from LBL) which is what I like to maximize...

cheers,
-j

[Jan 23, 2011 4:26:24 PM]

gb077492
Advanced Cruncher
Joined: Dec 24, 2004
Post Count: 96
Status: Offline


Re: New thread for It's raining Dengue, Hallelujah!...

So, if assumes a computer uses 4 hours/task

Not a good assumption, as this is DDT2...

I just checked my result log and I have 4 validated WUs that each completed in less than 15 minutes. On an 8-way box that's a little higher request rate than you're allowing for.

Indeed, I have an 8-way server that I can only control through the web pages and it ran out of work twice in the last few days while I had it set to run DDT2 only and a 2 day cache (it couldn't keep the cache full as soon as anything went wrong since the DDT2 WUs trickle out so slowly).

With boxes which are not always-on I usually set the cache to less than one day. What you're suggesting says that I shouldn't do that.

Methinks you're going to run very short of "reliable" boxes if this is implemented.

Mike

[Jan 23, 2011 10:09:50 PM]

Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:


Re: New thread for It's raining Dengue, Hallelujah!...

Well, I didn't expressely state it, but my assumption was for a client that didn't manage to fill-up it's cache with DDDT2, so instead of sitting idle part of the time was filling-up with a mix of other work instead, and 4 hours/task will atleast be a good starting-point...

But, if going to take the extremes, a slow dual-core that uses 20 hours/task, and can't get any work, will make 600 scheduler-request per task if asks once per minute, while with v6.12.xx, the max goes like 10 minutes, 20 minutes, 40, 80, 160, 320... Exactly how long the deferral will be for each failed request is more-or-less random, but chances are there's only be 5 or 6 scheduler-request before a task finish, for a total of 6 or 7. So, asking once per minute will give 100x higher load.

For the 15-minute-tasks, a dual-core will possibly get a single extra scheduler-request off, so 2 requests/task, while once per minute is 7 requests/task. This is only 3.5x more frequently.

For an 8-way and 15 minutes/task, there'll be a scheduler-request every 1-2 minutes regardless, so using a script is a waste of time.

Indeed, I have an 8-way server that I can only control through the web pages and it ran out of work twice in the last few days while I had it set to run DDT2 only and a 2 day cache (it couldn't keep the cache full as soon as anything went wrong since the DDT2 WUs trickle out so slowly).

With boxes which are not always-on I usually set the cache to less than one day. What you're suggesting says that I shouldn't do that.

The cache-size has little to do with this, since you're either accept non-DDDT2-work, and if so you'll normally on average use 1 scheduler-request per task regardless of 0.01 days or 5 days cache-size. Or, your computer will sit idle since can't get enough DDDT2-work to keep it busy, and this again will to a large extent be regardless of cache-size, since DDDT2-work isn't added fast enough to fulfill the demand.

Methinks you're going to run very short of "reliable" boxes if this is implemented.

A minimum deferral of 1 minute or 10 minutes on connection-errors should have little or no effect on a computers reliable-rating...

----------------------------------------

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."

[Jan 23, 2011 10:52:18 PM]

gb077492
Advanced Cruncher
Joined: Dec 24, 2004
Post Count: 96
Status: Offline


Re: New thread for It's raining Dengue, Hallelujah!...

A minimum deferral of 1 minute or 10 minutes on connection-errors should have little or no effect on a computers reliable-rating...

True, but you're talking about a maximum deferral of 24 hours. There's the rub. I would be mad to run with a cache less than one day under those conditions. Since a cache of 2 days or more will mean that a machine is unlikely to be marked as reliable (effectively can't be, save for errors in the correction factor), that is why I made my statement.

As to your other comments, I'm talking about set-and-forget users, not finger pokers. (But even these folks like to collect badges biggrin

)

Mike

----------------------------------------
[Edit 1 times, last edit by gb077492 at Jan 23, 2011 11:04:56 PM]

[Jan 23, 2011 11:03:26 PM]

deltavee
Ace Cruncher
Texas Hill Country
Joined: Nov 17, 2004
Post Count: 4891
Status: Offline
Project Badges:

90 day badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

14 day badge for Influenza Antiviral Drug Search

100 year badge for The Clean Energy Project - Phase 2

10 year badge for Computing for Clean Water

10 year badge for Drug Search for Leishmaniasis

10 year badge for GO Fight Against Malaria

200 year badge for Mapping Cancer Markers

100 year badge for Uncovering Genome Mysteries

200 year badge for Outsmart Ebola Together

200 year badge for FightAIDS@Home - Phase 2

200 year badge for Smash Childhood Cancer

200 year badge for Microbiome Immunity Project

200 year badge for Africa Rainfall Project

200 year badge for OpenPandemics - COVID-19


Re: New thread for It's raining Dengue, Hallelujah!...

I have started getting ts02_b's.

ts02_ b005_ pr56a0_ 1--

Does this mean we are 25% through this batch?

[Jan 24, 2011 2:38:30 AM]

My 2 cents worth
Cruncher
Joined: Sep 12, 2008
Post Count: 19
Status: Offline


Re: New thread for It's raining Dengue, Hallelujah!...

I am getting my fair share biggrin

And by the way I just decided to solve my issue with the random WU jumping on the Linux box that I would only allow it to crunch a days worth at a time by date and suspend the rest until it has the others done.

----------------------------------------

----------------------------------------
[Edit 1 times, last edit by My 2 cents worth at Jan 24, 2011 6:14:19 AM]

[Jan 24, 2011 6:11:52 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: New thread for It's raining Dengue, Hallelujah!...

Time for a new thread!!!!

[Jan 24, 2011 7:26:33 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: New thread for It's raining Dengue, Hallelujah!...

I have started getting ts02_b's.

ts02_ b005_ pr56a0_ 1--

Does this mean we are 25% through this batch?

Probably / could be that we're beyond 25% in work having been sent out, which by no means is a sign of completion [the client caches stockpiles is expected to be huge]. The last ''a'' was g50 that I can see in the message logs. Per seippel the ''d'' goes up to g22 (the names of the downloaded files contain this indicator).

We had not processed everything from the previous rain when we started to receive this last set, so it's not accurate, but we've had 76,000 validate so far since the 18th , 18,781 yesterday per project stats page, and probably thousands are waiting on a matching wingman. On 542,682 (the number before quorum was 271,341) suggests we've got long ways to go.

I think to complete something like a gold equivalent by the end of this set... 21 days have validated since start and 1.8 days worth are in PV jail, which then gets the total about half way emerald.

--//--

[Jan 24, 2011 8:01:02 AM]

[ ]