Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 84
|
![]() |
Author |
|
RaymondFO
Veteran Cruncher USA Joined: Nov 30, 2004 Post Count: 561 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Shortly after crunching CEP2 I quickly realized this was a challenging and problematic project. The amount of errors I saw from repair work units, uploading CEP2 WU's froze out any other internet access for any of my other computers, and the hardware requirements made this a challenging process. I quickly came to realize that crunching CEP2 was not for the “casual” or "faint of heart" cruncher, but for those who truly believe in what the Harvard research scientists were trying to accomplish.
----------------------------------------I could materially improve my daily performance results by mixing CEP2 with other WU's or just not crunching CEP2. I chose this project for other altruistic reasons that included converting my computers from Windows to Linux. I did this because there are times where one hopes a greater good can result from the sacrifice of ones personal benefit. That is one of the reasons why we are here at the WCG. I have made a decision to continue crunching CEP2 hoping that I can now reach the 10 year runtime mark. As of today, I now have 6 years and just under 67 days and counting. It is your decision how you will use your computers. Decide for yourself. [Edit 2 times, last edit by RaymondFO at Dec 1, 2010 1:43:27 PM] |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7660 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Interesting discussion. I have only crunched a few of the CEP2 units, about 30 so far, but have yet to see an error. Only my quads and a dual processor Xeon have gotten the units. They run Vista, Win7 and Linux Mint. They run in a mixed environment. The uploads are 24 megs. Tough project, but I will let it run.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
Bearcat
Master Cruncher USA Joined: Jan 6, 2007 Post Count: 2803 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
On my 8 core box, been seeing anywhere from 5 to 20 minutes difference between time Crunching and time actually on a wu. Presuming with 8 cores dedicated to this, takes time when multiple wu's want to write to disk at the same time. I've since added clean water in hoping it will mix crunching projects to lower disk writes. I have less than 100 days to get to 1 year crunching, then change to with restrictions. Couldn't imagine 16 threads trying this project. Probably kill a hard drive real quick.
----------------------------------------
Crunching for humanity since 2007!
![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Dear Dataman,
---------------------------------------- [...] kudos to WCG and IBM who have spent considerable effort to accommodate this project. I wholeheartedly support this - the collaboration with and support by the IBM team is awesome - both for crunchers and us scientists. There is A LOT of work going into a project like CEP2. Our thanks to all crunchers goes without saying. I am not certain your user community views the additional complexity of third-party servers as a benefit to them. WCG's servers maintain world class standards in the percent of availability. Yes, I guess we haven't explained this topic properly: The decision to buy our own servers and storage arrays was due to the limited capacity of the WCG servers. Since our results are bigger than usual, WCG could have only accommodated an upload of 400 results per day which would have rendered the whole project pointless. Our setup is at least in part modeled after the one of WCG and our research computing department is in close contact with our IBM friends on this. To give you some numbers: our servers have a load capacity of 200MB/sec (with some more in reserve) and the actual load is currently at about 10MB/sec (with the 40% throttle). So we are wide open for business ![]() A positive side effect of uploading directly here to Harvard is that we save the additional traffic of the detour via IBM. Looking at it from a business point of view, I am not sure excluding one segment of your customers from your product and irritating another segment of your customers is a winning strategy but that is for you to decide. [...] You not only compete with other WCG projects but also with 50+ other Distributed Computing projects that have equally good science objectives. I am not quite sure whether this analogy is useful but this is what I would reply: We have a project of a certain nature which we offer for your kind consideration. We hope to attract as many users as possible, but if a user finds our project too demanding and prefers to crunch on a different one, then that's ok, too. We don't compete with the other projects as they are equally worthwhile. If people like the challenge and the goal of CEP2, they will hopefully keep supporting us. We obviously don't want to upset anybody and are hence committed to fixing all problems that are in our hands - but we cannot do anything about the inherent cost of these calculations. The 'opt-in' is a tool to spare users with low-end machines a predictably frustrating CEP2 experience. We don't want to sell our project to people who will be unhappy with it. No news there ... that has been well documented by WCG. [...] With all due respect, to lecture crunchers who have run tens of thousands if not hundreds of thousands of cruncher hours is at a minimum naive and at a maximum just plain arrogant. I apologize if you found this point offending. I do computational science for a living and have done so for a while now - and I learn new stuff on a daily basis. Considering that this forum is also used by many newbies, some important general comments didn't seem out of place. I dare say there are crunchers here who, over the last decade, have exceedingly more experience in Distributed Computing than your small project has. Most definitely - I do however doubt that you will find many crunchers that have more experience as computational quantum chemists than our team. And it requires this knowledge of the code and of the science to put together a useful project. Ha ha ha ... did you enjoy your experience with MrKermit's farm? I was more referring to people like my labmate who couldn't get CEP2 workunits. But yes, MrKermit's sudden data-flood (which brought our original server to its knees one night) helped us to significantly improve our system, which is great. We live and learn ![]() WCG can handle any DC project. All projects allow the completion of science that could not otherwise be attempted. That, my friend, is the nature of Distributed Computing. Excellent - we are in agreement then. No need to bury CEP2 after all ![]() CEP2 has the largest upload files of any project I have crunched. Even GPUGrid's files (~16 meg) behave much better as they do not throttle them. Yes, the result size is indeed unfortunate. We went to some lengths to cutting it as much as possible (including reducing the main binary array from double to single precision). What is left is pretty much bare bone. If we were to dump the remaining key results there would be no point in generating them in the first place. The throttle has actually little to do with the upload behaviour... Then multiply by the number jobs. Correct - hence the default setting and 'opt-in'. Then by all means improve it in the future. Well, we've been tailoring Q-Chem for nearly a year before the launch to make things as smooth as possible on the grid. Q-Chem by itself is a high-performance program package with 135MB of source code which has been developed, expanded, and optimized for 20 years by professional developers and legions of academic contributors from the finest universities around the world... Believe it or not, but many smart people have put some serious thought into this code ![]() Bottomline from a result oriented perspective (which one could argue is a relevant one in WCG) is that CEP2 works! It produces many results every day, and we make progress with our research - despite all the bumps in the road! And for that we are very grateful to the WCG crunchers! Best wishes The CEP Team [Edit 1 times, last edit by Former Member at Dec 1, 2010 1:30:55 PM] |
||
|
anhhai
Veteran Cruncher Joined: Mar 22, 2005 Post Count: 839 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
it is wonderful when the scientist read and give us feed back like this. I personally love the idea of solar power (I got solar panels + a solar water heater at my house -- I live in the desert, so a lot of sun). Hopefully between us cruncher giving constructive criticism and the wonderful job the scientist/WCG does we can make this project more user friendly and make it so I and everyone can power our whole home with solar panels.
----------------------------------------![]() |
||
|
Dataman
Ace Cruncher Joined: Nov 16, 2004 Post Count: 4865 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thank you very much for your timely, honest and informative response, cleanenergy.
----------------------------------------OK, you have made a convert. ![]() You really should work on the file size/bandwith problem though. It is a killer. ![]() Cheers and good luck ... ![]() ![]() ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I have been running CEP2 without restrictions since it has become available in Windows, and to date I have yet to encounter an error. I think it’s appropriate here to thank the Harvard and WCG teams for their hard work in successfully implementing this resouce-intensive project.
I do have some questions about my results though. Sometimes I fail to complete all 16 jobs in a work unit due to: 1. a timeout after 12 hours, or 2. a message of "Application exited with RC = 0x1" in Job #12 Are these results still useful to the researchers? What is the meaning of "Application exited with RC = 0x1"? |
||
|
travisebert
Cruncher Joined: Apr 27, 2007 Post Count: 4 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have two servers with 8 cores each. When more than about 5 or so cores are running and a new CEP2 work units starts it creates some sort of an issue and all the 8 currently running processes are restarted from their last check point. This costs only a few seconds or minutes on most projects, but the CEP2 project can lose several hours, apparently.
This was happening so much that I was losing sometimes 20 CPU hours per day, so I had to turn on the restriction to only have one CEP2 work unit at a time. It didn't seem right to waste so much valuable CPU time when it could be used on the other projects. This is a shame. I like the idea behind this project. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Dear Dataman and others,
We really appreciate that you express all your criticisms of CEP2. Despite all the preparation and thought we put into this project, not everything always works out as anticipated. Your comments help us to identify weak spots and possible improvements. The fact that CEP2 (both in terms of upload times as well as errors) runs perfectly smooth for some users and makes a lot of trouble for others suggests platform dependent issues. Getting a hold of these is very tough because of the extreme heterogeneity of WCG. So, please keep the description of problems coming in the appropriate forum threads with as much detail as possible. The IBM crew and we are doing our best to figuring out the issues and improving on them. Thanks for crunching CEP2 – we know that our project is a tough cookie! Best wishes from Cambridge, MA Your CEP team |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Dear steveleg,
absolutely - all uploads are useful to us and our research. In the first case the timeout prevents that you have to crunch on a workunit for too long and cuts the more time intensive parts of the calcs short - the results up to that point however are perfectly fine. The second case indicates a problem in the numerics, which is very common in computational chemistry. Everything before that is perfectly fine and it also tells us that the molecule in that workunit is a toughie. Best Your CEP team |
||
|
|
![]() |