Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 7
|
![]() |
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi Crunchers!
We are putting an exciting new library on the grid in the next couple of days, which is focussing on molecules which are more synthetically accessible. Our initial molecules ( which we think should provide somewhere in the range of a week's worth of crunching on the grid) will range from very small (ie quick for you to complete - yay!) to a little larger than we have tried before. This means that there may be a few more failed WUs than you are used to expecting, but don't worry - we will be keeping an eye on the jobs that come back and trying to stay within the 'sweet spot' with our subsequent batches of molecules, so you guys are crunching as efficiently as possible. Thanks for your continued support - you guys are awesome :D Your Harvard CEP Team |
||
|
AgrFan
Senior Cruncher USA Joined: Apr 17, 2008 Post Count: 376 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Did the new library start hitting the grid today? I am now seeing runtimes between 1 and 18 hours. Several workunits failed in Job #2 and were marked valid. Isn't Job #2 the most important job to complete? Does this mean I wasted hours of crunching time? I run my boxes 24x7 so checkpointing is not an issue for me but what about those members who do not? I would expect errors/resends to increase with these long runtimes. I may need to turn CEP2 off until the runtimes stabilize. The workunits have been running for quite some time in the 6-8 hour range. Can't the workunit sizing be checked first with a Beta test before unleashing them into the wild?
----------------------------------------E224168_ 856_ I.63.C50H26N10O2S.00321331.0.set1d06_ 0-- XXXXX-XX Valid 7/30/14 12:48:41 7/30/14 13:53:39 1.02 / 1.04 13.9 / 13.9 <= failed 0xb Job #1 E224148_ 945_ I.64.C48H22N8O8.00226840.3.set1d06_ 0--XXXXX-XX Valid 7/30/14 03:16:19 7/30/14 15:31:37 12.08 / 12.21 166.8 / 166.8 <= failed 0xb Job #2 E224137_ 687_ I.64.C51F6H21N7.00390094.4.set1d06_ 0--XXXXX-XX Valid 7/29/14 22:46:38 7/30/14 17:28:07 18.00 / 18.65 214.4 / 214.4 <= time limit reached in Job #6 E224141_ 471_ I.62.C51F4H27N5O2.00421925.4.set1d06_ 0-- XXXXX-XX Valid 7/29/14 22:28:52 7/30/14 12:07:30 13.48 / 13.60 256.2 / 256.2 <= failed 0xb Job #2 E224136_ 716_ I.63.C45H23N9O8S.00121530.2.set1d06_ 0-- XXXXX-XX Valid 7/29/14 18:35:31 7/30/14 12:48:41 18.00 / 18.18 287.1 / 287.1 <= time limit reached in Job #2 E224135_ 139_ I.64.C47F6H21N7O4.00325289.0.set1d06_ 0-- XXXXX-XX Valid 7/29/14 17:45:00 7/30/14 03:16:19 9.39 / 9.48 204.5 / 204.5 <= failed 0xb Job #2 [Edit 6 times, last edit by AgrFan at Jul 31, 2014 12:34:13 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi,
Those batch numbers do not correspond to the new batches that were put in this evening - they probably won't hit the grid until about Friday afternoon. They are also ordered by number of electrons, which is the quantity against which the computational cost of a job scales meaning the initial jobs will be easier, and that the cost of a work unit is much easier for you guys to predict (which is also why I put this notice up in advance of the units hitting the grid). I have to admit I am unsure whether the job numbers shown here are 0 indexed or 1 indexed, but the most important job in the old library is the third one to be run (a geometry optimization). This will change with the new batches, where we have placed the optimization as the first job. With regards to beta testing, we do perform some beta testing and have a good idea of the size limits of the grid. However, given the diverse nature of machines on the grid, I wanted to give all the crunchers a heads up, so they could put the failure, or otherwise, or work units into the context of the project. Your Harvard CEP Team |
||
|
AgrFan
Senior Cruncher USA Joined: Apr 17, 2008 Post Count: 376 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks for the update. I'm guessing these long units are cleanup tasks for the current work being crunched. I will watch for the new work units on Friday.
----------------------------------------[Edit 1 times, last edit by AgrFan at Jul 31, 2014 2:14:45 AM] |
||
|
ca05065
Senior Cruncher Joined: Dec 4, 2007 Post Count: 325 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Recently CEP2 work units have had consistent run times between 4 and 6 hours. Since 29th July they have dropped to 40 to 55 minutes. All tasks end in job 1:
[06:36:28] Finished Job #0 [06:36:28] Starting job 1,CPU time has been restored to 415.243462. Application exited with RC = 0xc0000005 [07:06:24] Finished Job #1 [07:06:24] Starting job 2,CPU time has been restored to 2192.157252. [07:06:24] Skipping Job #2 All are valid in results status section except one error and a few in Pval and Pver. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi ca05065,
Thanks for the info. We are reaching the end of the old library and so the jobs are getting tougher. I can see here that finishing on job 1 means that the optimisation of the molecular geometry failed. My feeling about the new library is that the jobs towards the end of next week will run longer than you have been seeing recently, but the jobs at the start will complete pretty quickly. Since we are now starting with the geometry optimisation, if you get a job in which this does not converge, it will also be less time until you get a new job to replace it. Your Harvard CEP Team |
||
|
cjslman
Master Cruncher Mexico Joined: Nov 23, 2004 Post Count: 2082 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'm used to seeing the CEP2 WUs last between 6-8 hours on my machine (I only crunch CEP2 on weekends), but now they're lasting a little over an hour
----------------------------------------![]() ![]() CJSL Crunching for a brighter future... |
||
|
|
![]() |