Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 84
Posts: 84   Pages: 9   [ Previous Page | 1 2 3 4 5 6 7 8 9 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 861291 times and has 83 replies Next Thread
RaymondFO
Veteran Cruncher
USA
Joined: Nov 30, 2004
Post Count: 561
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: To Project Scientists: Clean Energy Project Phase 2 - Limited Cruncher Participation

Shortly after crunching CEP2 I quickly realized this was a challenging and problematic project. The amount of errors I saw from repair work units, uploading CEP2 WU's froze out any other internet access for any of my other computers, and the hardware requirements made this a challenging process. I quickly came to realize that crunching CEP2 was not for the “casual” or "faint of heart" cruncher, but for those who truly believe in what the Harvard research scientists were trying to accomplish.

I could materially improve my daily performance results by mixing CEP2 with other WU's or just not crunching CEP2. I chose this project for other altruistic reasons that included converting my computers from Windows to Linux. I did this because there are times where one hopes a greater good can result from the sacrifice of ones personal benefit. That is one of the reasons why we are here at the WCG.

I have made a decision to continue crunching CEP2 hoping that I can now reach the 10 year runtime mark. As of today, I now have 6 years and just under 67 days and counting.

It is your decision how you will use your computers. Decide for yourself.
----------------------------------------
[Edit 2 times, last edit by RaymondFO at Dec 1, 2010 1:43:27 PM]
[Nov 30, 2010 8:46:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7660
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: To Project Scientists: Clean Energy Project Phase 2 - Limited Cruncher Participation

Interesting discussion. I have only crunched a few of the CEP2 units, about 30 so far, but have yet to see an error. Only my quads and a dual processor Xeon have gotten the units. They run Vista, Win7 and Linux Mint. They run in a mixed environment. The uploads are 24 megs. Tough project, but I will let it run.

Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Dec 1, 2010 1:33:40 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Bearcat
Master Cruncher
USA
Joined: Jan 6, 2007
Post Count: 2803
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: To Project Scientists: Clean Energy Project Phase 2 - Limited Cruncher Participation

On my 8 core box, been seeing anywhere from 5 to 20 minutes difference between time Crunching and time actually on a wu. Presuming with 8 cores dedicated to this, takes time when multiple wu's want to write to disk at the same time. I've since added clean water in hoping it will mix crunching projects to lower disk writes. I have less than 100 days to get to 1 year crunching, then change to with restrictions. Couldn't imagine 16 threads trying this project. Probably kill a hard drive real quick.
----------------------------------------
Crunching for humanity since 2007!

[Dec 1, 2010 4:20:28 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: To Project Scientists: Clean Energy Project Phase 2 - Limited Cruncher Participation

Dear Dataman,

[...] kudos to WCG and IBM who have spent considerable effort to accommodate this project.

I wholeheartedly support this - the collaboration with and support by the IBM team is awesome - both for crunchers and us scientists. There is A LOT of work going into a project like CEP2. Our thanks to all crunchers goes without saying.

I am not certain your user community views the additional complexity of third-party servers as a benefit to them. WCG's servers maintain world class standards in the percent of availability.

Yes, I guess we haven't explained this topic properly: The decision to buy our own servers and storage arrays was due to the limited capacity of the WCG servers. Since our results are bigger than usual, WCG could have only accommodated an upload of 400 results per day which would have rendered the whole project pointless. Our setup is at least in part modeled after the one of WCG and our research computing department is in close contact with our IBM friends on this. To give you some numbers: our servers have a load capacity of 200MB/sec (with some more in reserve) and the actual load is currently at about 10MB/sec (with the 40% throttle). So we are wide open for business biggrin .
A positive side effect of uploading directly here to Harvard is that we save the additional traffic of the detour via IBM.

Looking at it from a business point of view, I am not sure excluding one segment of your customers€ from your product and irritating another segment of your customers is a winning strategy but that is for you to decide. [...] You not only compete with other WCG projects but also with 50+ other Distributed Computing projects that have equally good science objectives.

I am not quite sure whether this analogy is useful but this is what I would reply: We have a project of a certain nature which we offer for your kind consideration. We hope to attract as many users as possible, but if a user finds our project too demanding and prefers to crunch on a different one, then that's ok, too. We don't compete with the other projects as they are equally worthwhile. If people like the challenge and the goal of CEP2, they will hopefully keep supporting us. We obviously don't want to upset anybody and are hence committed to fixing all problems that are in our hands - but we cannot do anything about the inherent cost of these calculations. The 'opt-in' is a tool to spare users with low-end machines a predictably frustrating CEP2 experience. We don't want to sell our project to people who will be unhappy with it.

No news there ... that has been well documented by WCG. [...] With all due respect, to lecture crunchers who have run tens of thousands if not hundreds of thousands of cruncher hours is at a minimum naive and at a maximum just plain arrogant.

I apologize if you found this point offending. I do computational science for a living and have done so for a while now - and I learn new stuff on a daily basis. Considering that this forum is also used by many newbies, some important general comments didn'€™t seem out of place.

I dare say there are crunchers here who, over the last decade, have exceedingly more experience in Distributed Computing than your small project has.

Most definitely - I do however doubt that you will find many crunchers that have more experience as computational quantum chemists than our team. And it requires this knowledge of the code and of the science to put together a useful project.

Ha ha ha ... did you enjoy your experience with MrKermit'€™s farm?

I was more referring to people like my labmate who couldn't get CEP2 workunits. But yes, MrKermit's sudden data-flood (which brought our original server to its knees one night) helped us to significantly improve our system, which is great. We live and learn wink .

WCG can handle any DC project. All projects allow the completion of science that could not otherwise be attempted. That, my friend, is the nature of Distributed Computing.

Excellent - we are in agreement then. No need to bury CEP2 after all biggrin.

CEP2 has the largest upload files of any project I have crunched. Even GPUGrid's files (~16 meg) behave much better as they do not throttle them.

Yes, the result size is indeed unfortunate. We went to some lengths to cutting it as much as possible (including reducing the main binary array from double to single precision). What is left is pretty much bare bone. If we were to dump the remaining key results there would be no point in generating them in the first place. The throttle has actually little to do with the upload behaviour...

Then multiply by the number jobs.

Correct - hence the default setting and 'opt-in'.

Then by all means improve it in the future.

Well, we've been tailoring Q-Chem for nearly a year before the launch to make things as smooth as possible on the grid. Q-Chem by itself is a high-performance program package with 135MB of source code which has been developed, expanded, and optimized for 20 years by professional developers and legions of academic contributors from the finest universities around the world... Believe it or not, but many smart people have put some serious thought into this code wink.

Bottomline from a result oriented perspective (which one could argue is a relevant one in WCG) is that CEP2 works! It produces many results every day, and we make progress with our research - despite all the bumps in the road! And for that we are very grateful to the WCG crunchers!

Best wishes

The CEP Team
----------------------------------------
[Edit 1 times, last edit by Former Member at Dec 1, 2010 1:30:55 PM]
[Dec 1, 2010 6:00:55 AM]   Link   Report threatening or abusive post: please login first  Go to top 
anhhai
Veteran Cruncher
Joined: Mar 22, 2005
Post Count: 839
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: To Project Scientists: Clean Energy Project Phase 2 - Limited Cruncher Participation

it is wonderful when the scientist read and give us feed back like this. I personally love the idea of solar power (I got solar panels + a solar water heater at my house -- I live in the desert, so a lot of sun). Hopefully between us cruncher giving constructive criticism and the wonderful job the scientist/WCG does we can make this project more user friendly and make it so I and everyone can power our whole home with solar panels.
----------------------------------------

[Dec 1, 2010 6:19:42 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Dataman
Ace Cruncher
Joined: Nov 16, 2004
Post Count: 4865
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: To Project Scientists: Clean Energy Project Phase 2 - Limited Cruncher Participation

Thank you very much for your timely, honest and informative response, cleanenergy.

OK, you have made a convert. wink When BOINC@Australia returns from its current crunching assault at Spinhenge, I will put CEP2 back on my crunch list and run it at a minimum level in rotation with the other projects I support.

You really should work on the file size/bandwith problem though. It is a killer. biggrin

Cheers and good luck ... peace

cowboy
----------------------------------------


[Dec 1, 2010 8:57:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: To Project Scientists: Clean Energy Project Phase 2 - Limited Cruncher Participation

I have been running CEP2 without restrictions since it has become available in Windows, and to date I have yet to encounter an error. I think it’s appropriate here to thank the Harvard and WCG teams for their hard work in successfully implementing this resouce-intensive project.

I do have some questions about my results though. Sometimes I fail to complete all 16 jobs in a work unit due to:

1. a timeout after 12 hours, or
2. a message of "Application exited with RC = 0x1" in Job #12

Are these results still useful to the researchers? What is the meaning of "Application exited with RC = 0x1"?
[Dec 1, 2010 9:22:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
travisebert
Cruncher
Joined: Apr 27, 2007
Post Count: 4
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: To Project Scientists: Clean Energy Project Phase 2 - Limited Cruncher Participation

I have two servers with 8 cores each. When more than about 5 or so cores are running and a new CEP2 work units starts it creates some sort of an issue and all the 8 currently running processes are restarted from their last check point. This costs only a few seconds or minutes on most projects, but the CEP2 project can lose several hours, apparently.

This was happening so much that I was losing sometimes 20 CPU hours per day, so I had to turn on the restriction to only have one CEP2 work unit at a time. It didn't seem right to waste so much valuable CPU time when it could be used on the other projects.

This is a shame. I like the idea behind this project.
[Dec 1, 2010 9:59:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: To Project Scientists: Clean Energy Project Phase 2 - Limited Cruncher Participation

Dear Dataman and others,

We really appreciate that you express all your criticisms of CEP2. Despite all the preparation and thought we put into this project, not everything always works out as anticipated. Your comments help us to identify weak spots and possible improvements.

The fact that CEP2 (both in terms of upload times as well as errors) runs perfectly smooth for some users and makes a lot of trouble for others suggests platform dependent issues. Getting a hold of these is very tough because of the extreme heterogeneity of WCG.

So, please keep the description of problems coming in the appropriate forum threads with as much detail as possible. The IBM crew and we are doing our best to figuring out the issues and improving on them.

Thanks for crunching CEP2 – we know that our project is a tough cookie!

Best wishes from Cambridge, MA

Your CEP team
[Dec 2, 2010 4:26:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: To Project Scientists: Clean Energy Project Phase 2 - Limited Cruncher Participation

Dear steveleg,

absolutely - all uploads are useful to us and our research.

In the first case the timeout prevents that you have to crunch on a workunit for too long and cuts the more time intensive parts of the calcs short - the results up to that point however are perfectly fine.

The second case indicates a problem in the numerics, which is very common in computational chemistry. Everything before that is perfectly fine and it also tells us that the molecule in that workunit is a toughie.

Best

Your CEP team
[Dec 2, 2010 4:39:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 84   Pages: 9   [ Previous Page | 1 2 3 4 5 6 7 8 9 | Next Page ]
[ Jump to Last Post ]
Post new Thread