World Community Grid - View Thread - To Project Scientists: Clean Energy Project Phase 2

World Community Grid Forums

Category: Completed Research

Forum: The Clean Energy Project - Phase 2 Forum

Thread: To Project Scientists: Clean Energy Project Phase 2 - Limited Cruncher Participation

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 84

[ ]

Author

This topic has been viewed 875188 times and has 83 replies

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: To Project Scientists: Clean Energy Project Phase 2 - Limited Cruncher Participation

I have looked over the result page and it seems as thou you are geting credit for the work completed not how long it took you to run it.

----------------------------------------
[Edit 1 times, last edit by Former Member at Nov 27, 2010 10:37:26 PM]

[Nov 27, 2010 10:36:52 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: To Project Scientists: Clean Energy Project Phase 2 - Limited Cruncher Participation

Your calculation seems to be a little bit too optimistic, because for CEP2 the average efficiency is around 90%. If you crunch CEP2 only, you will "loose" 1 core per 10 cores. This observation reflects the decreasing of daily earned credits for my machines (from 9'600 to 8'400 daily).

From the standpoint of the number of days contributed to the project, this is based on the run time for each machine. If I were running 7 or more threads on one machine, I would agree you concerning the efficiency, but the largest number of cores I have on one machine is 4 with 4 active tasks. The processing for one machine does not interfere with another. The machine which reaches the 12 hour maximum the most is the duo-core laptop which I am using right now.

And by the way, the reason my run time has been degraded somewhat has to do with the number of WUs in PV jail which has increased somewhat over the Thanksgiving holiday. biggrin

----------------------------------------
[Edit 1 times, last edit by Former Member at Nov 28, 2010 2:39:29 AM]

[Nov 28, 2010 2:33:48 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: To Project Scientists: Clean Energy Project Phase 2 - Limited Cruncher Participation

I am running fdupes to minimize some of the kernel buffers wasted by CEP2 among other things, such as saving 70% disk space.

please see other thread. cep2 provides hyperthreading opportunities by being sucha different animal, at least that's my thoery, and there's a lot of i7 9xx cores who can help assess/refine the slots, i think there's some potential in reducing page faults for CEP2 and the rest of the projects on large core counts.

http://secure.worldcommunitygrid.org/forums/wcg/viewthread_thread,30419

[Nov 28, 2010 8:45:03 AM]

KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1673
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

180 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

5 year badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

10 year badge for Help Fight Childhood Cancer

2 year badge for Influenza Antiviral Drug Search

20 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

100 year badge for Mapping Cancer Markers

10 year badge for Uncovering Genome Mysteries

50 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

10 year badge for Microbiome Immunity Project

20 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: To Project Scientists: Clean Energy Project Phase 2 - Limited Cruncher Participation

Just for information:

CEP2 is computed on real cores (no HT)
Hosts are running WinXP 32 respectively WinXP 64
Involved CPUs are: Q6600, Q9450, Xeon E5345, AMD Phenom II x6
Used Boinc version: 5.10.45 and 6.10.58

From time to time there are invalid results whatever the involved CPU is.
I have also to correct my previous statement: indeed, I can notice a performance decreasing of near to 20% (not 10%) with CEP2.
It is not a complain only an observation !
However, despite of the project importance - because every project is important at WCG - it is a good approach to tend to reach the best efficiency as possible. Honestly, I don't have the feeling that this science reaches the best efficiency.
Enjoy,
Yves

----------------------------------------

Décrypthon team progress - KerSamson's contribution

----------------------------------------
[Edit 1 times, last edit by KerSamson at Nov 28, 2010 2:47:41 PM]

[Nov 28, 2010 2:45:13 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: To Project Scientists: Clean Energy Project Phase 2 - Limited Cruncher Participation

Dear Crunchers,

Thanks for your criticisms and suggestions (and of course for participating in our project!). We'd like to address some of the issues and concerns that were raised in this thread:

[...] CEP2 has less than 25% of the run time of the most active project (HFCC) and less than half the other least crunched project (Water). With all the hoopla about the Win/Mac release, I was expecting more.

At present this is not unexpected for the following reasons:
1) The CEP2 load share is currently throttled to 40% of other projects. It will be increased incrementally to 100% over the next week or two. We opted for this warm-up strategy to safely test the stability of our servers, as this is the first time a WCG project uses its own hardware setup to collect results.
2) CEP2 is an 'opt-in' project due to its demanding nature and will hence exclude a certain segment of users.
3) For the same reason, every client machine is by default limited to one CEP2 instance at a time (which can be increased manually), since too many concurrent jobs can impact the overall performance (i.e., running out of RAM or simultaneous I/O can clog up the system and degrade its efficiency; the sweet-spot for many setups is 2-5 CEP2 jobs, but never more than the number of cores).
We clearly aim for a bigger load share but at this time we are where we should be.

I have been running 30-40 threads over the past few weeks and there is no problem getting work so I was wondering why so few crunchers.

We are actually preparing a big, multistage drive to rally new participants (in particular in our research community which has a lot of computers at its disposal), but this is scheduled for the time after the throttled warm-up. We already heard of people who cannot get as many CEP2 workunits as they'd like, so there is no point in increasing demand just now.

Is it just because this is such a horrible project to crunch? [...] worst project [...]

We prefer to think of it as 'challenging' cool

From huge bandwidth requirements, large disk and memory impact, lengthy uploads, unpredictable run times, odd return codes, etc. almost every aspect of this project is cruncher unfriendly. [...] To me, this project fails to meet WCGâs otherwise high project standards. [...] the project scientists should rethink their approach and please consider the impact on the crunchers who do the work.

CEP2 is undoubtedly demanding on the host hardware (that's why it's 'opt-in'), and we take your criticism seriously - but gb009761's comment hits the mark. The high demand reflects the nature of the research we are conducting! First-principles quantum chemistry is hard because it is a sophisticated method capable of capturing intricate physical phenomena.
You may argue that such a project should then not be on WCG. We disagree on this since already at this early stage CEP2 has been a success for us, producing more data than we could have obtained using our cluster. If we want to do this nontrivial, high quality numerical study on a large scale, then the WCG is the only way. That's what makes the WCG exciting for researchers: it allows us to do science which could otherwise not be done. We clearly think CEP2 is worthwhile, otherwise we wouldn't invest our time and money on it.

A few more concrete points:
1) Our servers run stable, have plenty of capacity, and an analysis indicates that it is not a bottleneck for uploads.
2) While the uploads are unfortunately bigger than in other WCG projects, we currently average at 22.7MB per result, which may not exactly qualify as 'huge'.
3) The RAM use per job is limited to 512MB.
4) Instead of running many CEP2 jobs simultaneously, we recommend to mix CEP2 with jobs of a less hard-core project (this is incidentally a common strategy in high-performance computing). Never run more jobs than you have cores - this completely kills your computer's performance!
5) Our error rate is well below the limit set by IBM, but we understand the frustration about any problems.
6) While Q-Chem is a commercial code, we have the source, can modify it for CEP2, and are in close contact with the developers. Our IBM friends and we have been tweaking things for the WCG and continue to do so, but the inherent complexity of the methods will not go away. Program packages like Q-Chem are constantly improved and expanded by the professional developers and academic contributors, so it is already an optimized high-performance code - we merely tailor certain aspects of it for the specific situation of the grid.

I know this is a short project which will soon be completed even with limited participation

CEP2 is decidedly not a short project and will be around for a while! We do have estimates but as they depend on many variables it's not really worth setting in stone.

Again, thanks for all your feedback and input - the interaction with the WCG community is really great! We hope you find our explanations useful and decide to stick with CEP2. We are also glad that people like our research log wink

.

Cheers,

The CEP Team

[Nov 30, 2010 5:17:11 AM]

Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline


Re: To Project Scientists: Clean Energy Project Phase 2 - Limited Cruncher Participation

Snip quote:

"We hope you find our explanations useful and decide to stick with CEP2"

Thanks for your long exposition, but v.v. the concurrent post in other thread by uplinger to increase the "overflow pool share", for that IS the effect to any multi-science volunteered client, forced my hand... I've just NO'd the 1-per-host override option. At 40%, left alone, my quad was almost exclusively running CEP2 and has in last few days necessitated the abort of some 30 CEP2 tasks ** to restrain it to 2 of 4 cores, to maintain system efficiency, the inefficiency NOT recorded. Rather than doing 7-8 CEP2 results per day on my quad, it's back to maybe 2-2.2 results per day. I'd not call that a success, if squeezing out a few more years at a price is what you'd mark that... increased frustration for mix crunchers ( n / cores = 2 would have been good for many and me)

** 2 effects:

1. My quad is no longer rated reliable and thus been disqualified for contributing a little DDDT2 work when it's around.
2. The repair quantity in WCG's priority queue got bigger and has put more load on devices that are rated "reliable"

--//--

edit: spelling.

----------------------------------------

WCG

Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!

----------------------------------------
[Edit 2 times, last edit by Sekerob at Nov 30, 2010 11:35:21 AM]

[Nov 30, 2010 9:22:33 AM]

Hypernova
Master Cruncher
Audaces Fortuna Juvat ! Vaud - Switzerland
Joined: Dec 16, 2008
Post Count: 1908
Status: Offline
Project Badges:

90 day badge for Discovering Dengue Drugs - Together

2 year badge for Nutritious Rice for the World

20 year badge for Help Fight Childhood Cancer

14 day badge for Influenza Antiviral Drug Search

90 day badge for Discovering Dengue Drugs - Together - Phase 2

20 year badge for Computing for Clean Water

10 year badge for Drug Search for Leishmaniasis

10 year badge for GO Fight Against Malaria

5 year badge for Computing for Sustainable Water


Re: To Project Scientists: Clean Energy Project Phase 2 - Limited Cruncher Participation

The credit issue has surely an impact on the attractiveness. But in my case it is again the fact that on many of my devices CEP2 just fails and all WU's get errored. I loose too much time to test and select. And also it may happen that after some successfull WU's they all get errored after. To much hassle in the end. So I do crunch on a few of them but I surely could have done more. I am sure that many crunchers that get errors and have one device or two would just avoid it.
HFCC, FAAH, HCC and HCMD2 run on all my devices without problems.
But for whatever reason CEP2, C4CW, HPF2 fail on many of my devices.
So to avoid the Sapphire badge runtime limit syndrom I decided to crunch the problematic projects to 5 years minimum and all the other up to 25 if they run long enough.
DDDT2 is a special case. I get one WU per month biggrin

so it is not an issue.

----------------------------------------

[Nov 30, 2010 11:20:58 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: To Project Scientists: Clean Energy Project Phase 2 - Limited Cruncher Participation

From my perspective, the CEP2 error issues appear to be machine based not science based. I have about .05% error rate in the last 3 weeks. This is a 30x improvement on my farm over what I get with HPF2, where I get about a 1.5% error rate. biggrin

As to the lost time issue, we have to work through the WU. If this is the best that CEP2 and WCG can do for us, then we are best to just put our heads down and plow through the WU until they are done. The Linux/MAC systems were finishing 3% per month. With the addition of Win, we have done another 14%+ this month. cool

It appears that no improvements to the science are forthcoming before we finish CEP2 as the collective paralysis by analysis on CEP2 has not yielded any benefits. I expect none will appear in the remaining 4-5 months. crying

[Nov 30, 2010 2:05:25 PM]

anhhai
Veteran Cruncher
Joined: Mar 22, 2005
Post Count: 839
Status: Offline
Project Badges:

14 day badge for Nutritious Rice for the World

50 year badge for Help Cure Muscular Dystrophy - Phase 2

50 year badge for The Clean Energy Project - Phase 2

10 year badge for Computing for Clean Water

10 year badge for Computing for Sustainable Water

200 year badge for Mapping Cancer Markers

200 year badge for Uncovering Genome Mysteries

200 year badge for Outsmart Ebola Together

100 year badge for FightAIDS@Home - Phase 2

100 year badge for Smash Childhood Cancer

200 year badge for Microbiome Immunity Project


Re: To Project Scientists: Clean Energy Project Phase 2 - Limited Cruncher Participation

CEP2 normally would be my favorite project because of what it represents. however, I have pretty much decided to limit it as much as possible. I don't totally mind that it is not 100% efficient and that there is some CPU wasted. It is the nature of the project. However, I can't get over the fact that I continue to get upload problems (multiple systems), so that is just a big waste of CPU time/electricity.
The limitation of 1 or a million WU isn't very good either, but that can be worked on as time permits (it is the holidays, techs go on vacations)

edit: true the error rate is below the requirement. however, does WCG track the time wasted due to the errors? I mean for most other projects, WUs error out right away. However, for CEP2 they can error out in the middle or after a successful WU is done, wasting hrs of crunch time.

----------------------------------------

----------------------------------------
[Edit 1 times, last edit by anhhai at Nov 30, 2010 2:26:42 PM]

[Nov 30, 2010 2:24:15 PM]

Dataman
Ace Cruncher
Joined: Nov 16, 2004
Post Count: 4865
Status: Offline
Project Badges:

180 day badge for The Clean Energy Project

5 year badge for Help Fight Childhood Cancer

1 year badge for Influenza Antiviral Drug Search

5 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

2 year badge for Uncovering Genome Mysteries

2 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

45 day badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: To Project Scientists: Clean Energy Project Phase 2 - Limited Cruncher Participation

First, thanks to the Harvard team for responding, albeit defensively, to this thread.
Second and more important, many thanks to the crunchers who have responded with their thoughts and experiences with this project!!! It may not always seem so but I really do value the comments of other crunchers and especially the “farmers” who dedicate large amounts of time and money to support Distributed Computing worldwide. Each and every cruncher, at every level, is a true “hero” for the advancement of science.
Last, kudos to WCG and IBM who have spent considerable effort to accommodate this project. I was pleasantly surprised that WCG modified their own project preferences for CEP2 to accommodate the needs of their crunchers. Of all the projects I crunch, WCG sets the standard for listening and responding to their user community.

1) The CEP2 load share is currently throttled to 40% of other projects. It will be increased incrementally to 100% over the next week or two. We opted for this warm-up strategy to safely test the stability of our servers, as this is the first time a WCG project uses its own hardware setup to collect results.

I am not certain your user community views the additional complexity of third-party servers as a benefit to them. WCG’s servers maintain world class standards in the percent of availability. You have yet to prove yours.

2) CEP2 is an 'opt-in' project due to its demanding nature and will hence exclude a certain segment of users.

Looking at it from a business point of view, I am not sure excluding one segment of your “customers” from your product and irritating another segment of your customers is a winning strategy but that is for you to decide. Our decision is whether or not to buy the “product”.

3) For the same reason, every client machine is by default limited to one CEP2 instance at a time (which can be increased manually), since too many concurrent jobs can impact the overall performance (i.e., running out of RAM or simultaneous I/O can clog up the system and degrade its efficiency; the sweet-spot for many setups is 2-5 CEP2 jobs, but never more than the number of cores).
We clearly aim for a bigger load share but at this time we are where we should be.

No news there … that has been well documented by WCG.

We already heard of people who cannot get as many CEP2 workunits as they'd like, so there is no point in increasing demand just now.

Ha ha ha … did you enjoy your experience with MrKermit’s farm? biggrin

We prefer to think of it as 'challenging'.

You may prefer to think of it as you like. Others may prefer to think of it as not worth the extra effort as compared to other DC projects of equal value.

You may argue that such a project should then not be on WCG. We disagree on this since already at this early stage CEP2 has been a success for us, producing more data than we could have obtained using our cluster. If we want to do this nontrivial, high quality numerical study on a large scale, then the WCG is the only way. That's what makes the WCG exciting for researchers: it allows us to do science which could otherwise not be done. We clearly think CEP2 is worthwhile, otherwise we wouldn't invest our time and money on it.

I make no such argument! WCG can handle any DC project. All projects allow the completion of science that could not otherwise be attempted. That, my friend, is the nature of Distributed Computing.

1) Our servers run stable, have plenty of capacity, and an analysis indicates that it is not a bottleneck for uploads.

Good for you. It is a considerable bottleneck on my end.

2) While the uploads are unfortunately bigger than in other WCG projects, we currently average at 22.7MB per result, which may not exactly qualify as 'huge'.

CEP2 has the largest upload files of any project I have crunched. Even GPUGrid’s files (~16 meg) behave much better as they do not throttle them.

3) The RAM use per job is limited to 512MB.

Then multiply by the number jobs.

4) Instead of running many CEP2 jobs simultaneously, we recommend to mix CEP2 with jobs of a less hard-core project (this is incidentally a common strategy in high-performance computing). Never run more jobs than you have cores - this completely kills your computer's performance!

This comment speaks volumes. With all due respect, to lecture crunchers who have run tens of thousands if not hundreds of thousands of cruncher hours is at a minimum naïve and at a maximum just plain arrogant. So now I will lecture. I dare say there are crunchers here who, over the last decade, have exceedingly more experience in Distributed Computing than your small project has. You not only compete with other WCG projects but also with 50+ other Distributed Computing projects that have equally good science objectives. There are so many projects and so few people willing to dedicate their time and money to the effort. This is not 2004 … there are many crunching alternatives now … and you are not the only game in town! The easier you make it for us, the more willing the crunchers will be to spend their time and money on you.

5) Our error rate is well below the limit set by IBM, but we understand the frustration about any problems.

I’ll pass on a comment.

6) While Q-Chem is a commercial code, we have the source, can modify it for CEP2, and are in close contact with the developers. Our IBM friends and we have been tweaking things for the WCG and continue to do so, but the inherent complexity of the methods will not go away. Program packages like Q-Chem are constantly improved and expanded by the professional developers and academic contributors, so it is already an optimized high-performance code - we merely tailor certain aspects of it for the specific situation of the grid.

Then by all means improve it in the future. The science is worthy of doing so. That was a fundamental point of my original post.

CEP2 is decidedly not a short project and will be around for a while! We do have estimates but as they depend on many variables it's not really worth setting in stone.

I will retract my original comment. As the state of the project stands right now, it will indeed be around for a long time. I have contributed a year to the cause and will keep an eye on this project for the future. peace

*****
Thanks again to all who have commented in this thread. I really do appreciate it (for what it’s worth). Cheers and crunch on … coffee

----------------------------------------

----------------------------------------
[Edit 1 times, last edit by Dataman at Nov 30, 2010 6:42:58 PM]

[Nov 30, 2010 6:23:36 PM]

[ ]