World Community Grid - View Thread - Crunching efficiency

World Community Grid Forums

Category: Community

Forum: Chat Room

Thread: Crunching efficiency

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 9

[ ]

Author

This topic has been viewed 10023 times and has 8 replies

Steve WCG
Senior Cruncher
Joined: May 4, 2009
Post Count: 216
Status: Offline


Crunching efficiency

I am creating a testing methodology so I/we can easily test different theories on how to improve crunching efficiency, I see three basic approaches.

1. Run a project for a given amount of time and see what the averages come out to. I don't like this approach because there is so much variability it would be next to impossible to weed them out.

2. Capture a set of WUs (copy/ paste BOINC data dir) and repeatedly use the same set to create easily comparable runtimes.

3. Use multiple instances of a single WU to create a set of WUs as described in option 2.

I like number two because it is easiest to manage but on the other hand it is not flexible if we look at the size of the set and how long it would take to process. Number 3 is the most intriguing to me because it provides complete flexibility, is easy to pass a single WU to different machines (yes, I know I have to find out the "compatibility" groups ... it a WCG can share what those groups are that would be great. The problem I have with number 3 is that I don;t know which files in the BOINC Data directory I would have to change so after I made 24 coipies of the WU I can start processing them. I understand that I will need to do this with network activity turned off, so nothing gets uploaded but if anyone can piont me in the right direction for copying WUs I would greatly appreciate it.

The more exact results the methodology produces the better because even a 1% increase on an i7 produces an extra WU every 2 days.

Steve

[Oct 4, 2009 3:58:36 PM]

JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3715
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy

1 year badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

180 day badge for The Clean Energy Project

1 year badge for Help Fight Childhood Cancer

180 day badge for Influenza Antiviral Drug Search

10 year badge for Help Cure Muscular Dystrophy - Phase 2

90 day badge for Discovering Dengue Drugs - Together - Phase 2

180 day badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

180 day badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

5 year badge for Microbiome Immunity Project

180 day badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Crunching efficiency

Personally I use method 1 (a single project during several days) recording results numbers in spreadsheets and making various analysis of these numbers.
I cannot use methods 2 and 3 because most of the time I am comparing performance of a given project between XP32 and Ubuntu 64 (therefore using the same WUs on each side is not possible), or I am comparing projects yield in the same machine/OS to know for which project it is the most efficient.
Method 1 also allows to go on doing useful crunching during the whole test.

Method 2 is probably the best (with the necessary operating precautions) when doing performance comparisons between operating systems of the same family or between different machines in the same OS family. Crunching is useful only once, but since the comparison is more strict the size of the set of tested WUs can be much smaller.

I don't really see the benefit of method 3 over method 2, so I must have missed something. On the other hand I clearly see the burden (if not the complexity) of having to create the correct corresponding control lines in the control files. Good luck!

I warmly support your effort because I am interested with what you will find (whatever the method you will use) when comparing different systems and processing modes (64 vs 32).

Cheers. Jean.

----------------------------------------

Team--> Decrypthon -->Statistics/Join -->Thread

[Oct 5, 2009 4:16:16 AM]

Steve WCG
Senior Cruncher
Joined: May 4, 2009
Post Count: 216
Status: Offline


Re: Crunching efficiency

Additional reasons I like #3

I won't be the only person testing and this would make it easy to pass the base test file around (international team). Makes it easy to scale the test to be appropriately sized for both an overall runtime (12 hours) and thread count from single, duo, quad, quad w/HT, hex and perhaps also the future with sandy bridge. No matter the size of the actual test performed it would be directly comparable with no concern for the variables that exist in 1 and possibly also in 2. All testing WUs should complete within minutes of each other (if not it points to OS or other software interferring with crunching), plus it just *feels* right :-)

Do you think I would get any help if I directly emailed support?

[Oct 5, 2009 11:56:06 AM]

JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3715
Status: Offline
Project Badges:


Re: Crunching efficiency

OK Steve,
I understand now.
For this purpose help and support from the techs would certainly be welcome.
Before I became a CA I had asked for a Test project and there have been some positive reactions. Obviously the thread slipped off topic as usual but you might like to review it and use it, or to start another one of your own. This thread is there:
Would it be difficult to create a Test project?

Such a solution would eliminate the problem of creating and maintaining the associate control files, and it would make the test easily accessible to anybody anywhere anytime.

Cheers. Jean.

----------------------------------------

Team--> Decrypthon -->Statistics/Join -->Thread

[Oct 6, 2009 4:39:00 AM]

Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

180 day badge for Discovering Dengue Drugs - Together

1 year badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

2 year badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

5 year badge for Uncovering Genome Mysteries

5 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: Crunching efficiency

Additional reasons I like #3

To setup your tests, select "Activity, Suspended", and "Network activity Suspended", and suspend any unstarted task you'll want to use for testing-purposes, exit BOINC, make backup-copy, and start editing your client_state.xml

Specifically, you'll need to duplicate <workunit> and <result> making sure each wu and result has a different name (can make it easy and call it test01, test02, test03 and so on).

Each wu and each result also has some corresponding <file_info>, so you'll also need to duplicate these, and make sure wu/result points to correct <file_info>. For the <workunit> you'll likely can use the same input-files for all wu's, but you'll need to test it. For the result you'll need to use unique file-names for all...

Oh, and to make very sure that even if you do screw-up something, it's also an idea to change the URL's is pointing to a non-existing URL, so even if you do make a mistake and enable network-connect, nothing will be transferred....(if you also changes <master_url> you'll also need to re-name account_*.xml to the new master_url.)

As for testing-procedure, make sure client is set to "Activity, Suspend", and all tasks is also suspended. By doing this, it's just to enable how many tasks you'll want to run, example 4 on an 8-way system and so on, and afterwards start all at once by enabling "Activity, Run Always".

----------------------------------------

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."

----------------------------------------
[Edit 1 times, last edit by Ingleside at Oct 6, 2009 5:04:43 AM]

[Oct 6, 2009 5:02:47 AM]

Steve WCG
Senior Cruncher
Joined: May 4, 2009
Post Count: 216
Status: Offline


Re: Crunching efficiency

Thanks for the input ... Jean - your idea of having a *test* project sounds fantastic ... it would make it so easy to test the same exact WUws with different configurations, between different machines even ... swoon :-)

Ingleside ... thank you - sounds like I need to whip up a little script to fix everything up nice.

[Oct 6, 2009 12:16:49 PM]

Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:


Re: Crunching efficiency

Well, by using Notepad, a little cut-n-paste, a little editing choosing "easy" names, and using "replace" in notepad, and it shouldn't be a problem generating 1000 wu's in an hour...

By using example wu_0001 and result_0001 as the names in <file_info>, <workunit> and <result>, it's very easy to duplicate so gets a *_0002, *_0003 and so on. If a task has multiple wu-files and/or result-files, just choose something like a_wu_0001, b_wu_0001, c_wu_0001 and so on for how many input-files is neccessary, similarly for multiple result-files.

----------------------------------------

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."

[Oct 6, 2009 4:27:18 PM]

Steve WCG
Senior Cruncher
Joined: May 4, 2009
Post Count: 216
Status: Offline


Re: Crunching efficiency

Thanks for all you help .... I write software for a living so this will be easy ... I have been thinking about making it a utility where you can pick which project type you want to test and how many iterations , then have it make sure boinc is stopped, backup your current files, create the "test" WUs, fix up the files and launch the test. When done, make sure BPOINC stops, write the stats out to csv file, remove the test WUs, restore your original files and start crunching for real again.

----------------------------------------
[Edit 1 times, last edit by Steve WCG at Oct 6, 2009 4:38:35 PM]

[Oct 6, 2009 4:38:18 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Crunching efficiency

One of my team members just used "method #2" to compare an Intel i7 920 to an i7 860 running at equal clock speeds. It worked well but can be a little messy. An automated progam to run a WCG benchmark would be sweet. Please keep us updated!

[Oct 7, 2009 4:24:39 AM]

[ ]