Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 109
|
![]() |
Author |
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Your perception of your stats are still not appreciating the PV jail numbers. About day 4 you should reach a fairly constant, but HPF2 is extra special because it is highly susceptible to office Monday-Friday crunch contributing, adding to that the quorum 15 mechanism. Just look at this roller coaster
----------------------------------------http://i137.photobucket.com/albums/q210/Sekerob/WCGHPF2ProdChart.png and compare that to the project continuity of the others.
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Mar 4, 2010 6:32:52 PM] |
||
|
Hypernova
Master Cruncher Audaces Fortuna Juvat ! Vaud - Switzerland Joined: Dec 16, 2008 Post Count: 1908 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
a
----------------------------------------aa aaa aaaa aaaaa aaaaaa aaaaaaa aaaaaaaa aaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa............................... ![]() |
||
|
rilian
Veteran Cruncher Ukraine - we rule! Joined: Jun 17, 2007 Post Count: 1453 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
One of my hosts started getting random errors today/yesterday
----------------------------------------Result Name: ne416_ 00037_ 6-- <core_client_version>6.10.32</core_client_version> <![CDATA[ <message> Maximum elapsed time exceeded </message> <stderr_txt> Unhandled Exception Detected... - Unhandled Exception Record - Reason: Breakpoint Encountered (0x80000003) at address 0x7C81A3E1 Engaging BOINC Windows Runtime Debugger... ++ there is a long debug info below WU quited after 60 hours of crunching ![]() ![]() Beside this WU, some other quite after from 0.02 hours up to 7.xx hours with same error <core_client_version>6.10.32</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> </stderr_txt> ]]> ne751_ 00038_ 3-- computername Error 3/5/10 11:12:52 3/5/10 11:15:15 0.02 0.1 / 0.0 ne752_ 00050_ 9-- computername Error 3/5/10 11:12:25 3/5/10 11:15:15 0.02 0.2 / 0.0 ne753_ 00044_ 7-- computername Error 3/5/10 11:12:25 3/5/10 11:15:15 0.02 0.1 / 0.0 ne741_ 00042_ 13-- computername Error 3/5/10 11:12:25 3/5/10 11:15:15 0.02 0.2 / 0.0 ne735_ 00006_ 3-- computername Error 3/5/10 07:00:19 3/5/10 11:12:24 3.84 32.0 / 0.0 ne727_ 00029_ 17-- computername Error 3/5/10 07:00:19 3/5/10 11:12:24 3.37 28.0 / 0.0 ne691_ 00073_ 18-- computername Pending Validation 3/4/10 15:39:23 3/5/10 07:00:19 13.28 110.5 / 0.0 ne691_ 00043_ 2-- computername Error 3/4/10 15:39:23 3/5/10 11:12:24 7.37 61.3 / 0.0 ne691_ 00041_ 10-- computername Error 3/4/10 15:39:23 3/5/10 11:12:24 5.82 48.4 / 0.0 ![]() i can get messages log form this machine later... ---------------------------------------- [Edit 2 times, last edit by rilian at Mar 5, 2010 3:24:00 PM] |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
rillian,
----------------------------------------Please see my HPF2 forum post of today... BOINCTasks is getting an alert system to warn for tasks stuck in a loop. HPF2 is the only one I know at WCG that does that, rarely. I'm for now using RosettaView (no longer available on the intertube)
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Mar 5, 2010 3:56:09 PM] |
||
|
rilian
Veteran Cruncher Ukraine - we rule! Joined: Jun 17, 2007 Post Count: 1453 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Sekerob, thanks, i've seen this post ( http://www.worldcommunitygrid.org/forums/wcg/...24380_lastpage,yes#270466 ) about BOINCTasks tool
----------------------------------------unfortunately i have quite remote machines so even if it will warn me on some WU, i could not do anything in time Is there anything i can do, except not running HPF2 on that machine? it is GenuineIntel Intel(R) Xeon(TM) CPU 3.00GHz [x86 Family 15 Model 4 Stepping 10] Microsoft Windows Server 2003 Enterprise Server x86 Edition, Service Pack 2, (05.02.3790.00) |
||
|
robertmiles
Senior Cruncher US Joined: Apr 16, 2008 Post Count: 443 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I got somewhat similar errors on my laptop for a while, before I decided I had credit for enough of this type workunits for now and switched all three of my computers to another WCG subproject.
----------------------------------------A few details about that computer: 64-bit Windows Vista SP2 BOINC 6.10.18 several other BOINC projects,including the GPU type and the full CPU and full GPU type (Einstein) 8 GB memory for 2 CPU cores; BOINC allowed to use only 40% of it due to problems on my other two computers if more allowed Keep workunits in memory when suspended turned off, again due to problems on my other two computers BOINC allowed to use 60% of the CPU time, compared to 100% on the two computers with better results on this subproject Errors generally occur well after the workunit is started, when it's trying to resume from a checkpoint The GPU and Einstein workunits tend to suspend themselves whenever I use the keyboard or the touchpad. For Einstein workunits, at least, this lets a CPU-only workunit get a much shorter than usual piece of a timeslot; I suspect that could cause problems for CPU workunits with infrequent checkpoints if BOINC counts those pieces the same as a full timeslot. The GPU workunits resume within minutes after I stop using the keyboard and the touchpad; so do Einstein workunits, even when that requires an early suspension of a CPU-only workunit about the same as some other workunit going into high-priority mode. I've never been interested in overclocking enough to find instructions on how to do it, but that laptop is rather hot to put on my lap even with the current settings, and tends to use the high speed of its fan much more often now that I've found some GPU projects compatible with its GPU board (a G105M). [Edit 1 times, last edit by robertmiles at Mar 7, 2010 5:17:05 AM] |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
May have mentioned this before, but with me quad W7-64 bit and 64 bit client (6.10.36) was observing a pattern of HPF2 fails, but exclusively when running in combination with AutoDock sciences. First saw a number of 2 minute error-outs and one 50 minutes into the job all with the same lines in the result log ending in /401 whilst HFCC was running, so deselect that project and by the time none we left in the mix, all ran happy with RICE, HCC, HCMD2. Then yesterday I had a few FAAH come and forced 1 to start. Sure enough whilst running several HPF2 jobs failed in the familiar 2 minute style. The FAAH finished and since returned 4 more without issue, 2 still in progress with 2 hours under the buckle.
----------------------------------------Thus, anyone else having this particular experience or can reconstruct this to have happened when listing out the Result Status pages or BOINCTasks history (v 0.45)? A Sample: First set when a FAAH ran: World Community Grid 6.03 hpf2 nf439_00014 06:35:57 (06:13:36) 17-03-2010 10:10 17-03-2010 10:10 Reported: Ok World Community Grid 6.06 hcc1 X0000090400045200707131445 04:56:10 (04:46:55) 17-03-2010 09:57 17-03-2010 09:57 Reported: Ok World Community Grid 6.06 hcc1 X0000090400129200708021915 04:53:47 (04:44:07) 17-03-2010 09:24 17-03-2010 09:24 Reported: Ok World Community Grid 6.06 hcc1 X0000090400276200707121410 04:30:44 (04:28:19) 17-03-2010 06:31 17-03-2010 06:31 Reported: Ok World Community Grid 6.06 hcc1 X0000090400316200707121409 04:34:58 (04:32:13) 17-03-2010 05:01 17-03-2010 05:01 Reported: Ok World Community Grid 6.06 hcc1 X0000090400589200707121404 04:38:46 (04:36:22) 17-03-2010 04:30 17-03-2010 04:31 Reported: Ok World Community Grid 6.03 hpf2 nf439_00010 05:32:54 (05:31:14) 17-03-2010 03:34 17-03-2010 03:34 Reported: Ok World Community Grid 6.03 hpf2 nf439_00015 05:48:11 (05:44:52) 17-03-2010 02:00 17-03-2010 02:00 Reported: Ok World Community Grid 6.03 hpf2 nf406_00064 04:29:17 (04:24:05) 16-03-2010 23:51 16-03-2010 23:52 Reported: Ok World Community Grid 6.07 faah faah11385_ZINC11800521_xMut_md21780_02 06:24:48 (06:14:08) 16-03-2010 23:42 16-03-2010 23:43 Reported: Ok World Community Grid 6.03 hpf2 nf439_00011 00:01:16 (00:01:15) 16-03-2010 22:01 16-03-2010 22:02 Reported: Computation error (1,) World Community Grid 6.06 hcc1 X0000090370235200708021803 04:45:19 (04:38:38) 16-03-2010 22:00 16-03-2010 22:00 Reported: Ok World Community Grid 6.03 hpf2 nf406_00058 05:25:20 (04:50:20) 16-03-2010 20:12 16-03-2010 20:12 Reported: Ok World Community Grid 6.03 hpf2 nf405_00046 06:10:32 (05:41:34) 16-03-2010 19:48 16-03-2010 19:48 Reported: Ok World Community Grid 6.03 hpf2 nf389_00078 05:25:14 (05:11:54) 16-03-2010 17:55 16-03-2010 17:56 Reported: Ok World Community Grid 6.03 hpf2 nf439_00023 00:01:20 (00:01:12) 16-03-2010 17:18 16-03-2010 17:20 Reported: Computation error (1,) World Community Grid 6.03 hpf2 nf382_00032 05:33:06 (05:25:04) 16-03-2010 16:36 16-03-2010 16:36 Reported: Ok World Community Grid 6.06 hcc1 X0000090281140200708021314 04:58:47 (04:42:34) 16-03-2010 13:40 16-03-2010 13:40 Reported: Ok World Community Grid 6.03 hpf2 nf380_00030 05:12:20 (04:50:29) 16-03-2010 13:36 16-03-2010 13:37 Reported: Ok Second set when several HFCC ran: World Community Grid 6.06 hcc1 X0000084650807200703070838 04:35:01 (04:33:19) 10-03-2010 05:54 10-03-2010 05:55 Reported: Ok World Community Grid 6.03 hpf2 ne861_00046 04:53:04 (04:50:50) 10-03-2010 03:52 10-03-2010 03:52 Reported: Ok World Community Grid 6.03 hpf2 ne863_00000 05:51:41 (05:49:11) 10-03-2010 01:19 10-03-2010 01:20 Reported: Ok World Community Grid 6.03 hpf2 ne858_00011 05:07:29 (05:04:20) 09-03-2010 23:00 09-03-2010 23:06 Reported: Ok World Community Grid 6.03 hpf2 ne858_00042 05:06:34 (05:03:57) 09-03-2010 22:59 09-03-2010 23:06 Reported: Ok World Community Grid 6.03 hpf2 ne843_00019 05:22:19 (05:16:21) 09-03-2010 22:27 09-03-2010 23:06 Reported: Ok World Community Grid 6.03 hpf2 ne859_00044 05:47:49 (05:14:02) 09-03-2010 19:28 09-03-2010 19:28 Reported: Ok World Community Grid 6.06 hcc1 X0000084630459200703161915 05:17:39 (05:01:45) 09-03-2010 17:52 09-03-2010 17:53 Reported: Ok World Community Grid 6.03 hpf2 ne853_00105 06:44:52 (06:27:13) 09-03-2010 17:52 09-03-2010 17:53 Reported: Ok World Community Grid 6.06 hcc1 X0000084640008200703021829 05:34:07 (05:19:53) 09-03-2010 16:55 09-03-2010 16:55 Reported: Ok World Community Grid 6.03 hpf2 ne820_00038 06:26:45 (06:11:06) 09-03-2010 13:07 09-03-2010 13:07 Reported: Ok World Community Grid 6.03 hpf2 ne852_00027 00:01:11 (00:01:02) 09-03-2010 12:35 09-03-2010 12:36 Reported: Computation error (1,) World Community Grid 6.10 hfcc HFCC_s2_00419591_s2_0001 09:32:57 (09:20:57) 09-03-2010 12:33 09-03-2010 12:34 Reported: Ok World Community Grid 6.10 hfcc HFCC_s2_00418320_s2_0001 10:22:15 (10:10:21) 09-03-2010 11:53 09-03-2010 11:54 Reported: Ok World Community Grid 6.03 Human Proteome Folding - Phase 2 ne853_00092 00:50:50 (00:48:20) 09-03-2010 10:34 09-03-2010 10:36 Reported: Computation error (1,) World Community Grid 6.03 Human Proteome Folding - Phase 2 ne825_00040 00:01:26 (00:01:12) 09-03-2010 09:38 09-03-2010 09:39 Reported: Computation error (1,) World Community Grid 6.06 Help Conquer Cancer X0000084600343200703161822 05:08:41 (05:00:26) 09-03-2010 09:37 09-03-2010 09:37 Reported: Ok World Community Grid 6.03 Human Proteome Folding - Phase 2 ne820_00036 00:01:19 (00:01:15) 09-03-2010 05:53 09-03-2010 05:54 Reported: Computation error (1,) World Community Grid 6.10 Help Fight Childhood Cancer HFCC_s2_00418006_s2_0001 07:54:03 (07:50:56) 09-03-2010 05:52 09-03-2010 05:52 Reported: Ok World Community Grid 6.03 Human Proteome Folding - Phase 2 ne816_00007 05:46:25 (05:42:25) 09-03-2010 04:28 09-03-2010 04:28 Reported: Ok To emphasize, when no AutoDock jobs ran concurrently, there was a 100% hpf2 success rate, to include the periodic preemptive schedule in of a 300 hour CPDN model. edit: italics on jobs of interest.
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Mar 17, 2010 10:49:08 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I will not be able to try out your sucess formula on my Win7-64 for another 8-9 hours but I will certainly be testing this tonight!
|
||
|
Hypernova
Master Cruncher Audaces Fortuna Juvat ! Vaud - Switzerland Joined: Dec 16, 2008 Post Count: 1908 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Now that I stopped HPFP2 I wanted to give a more detailed look to the thousands of errors I got and unfortunately it is not only between one or two minutes that it fails.
----------------------------------------Here under a list of errors with the highest crunch time values before failing. I do not care the loss of points, but surely I am not very happy with the loss of time. ne580_ 00022_ 13-- Ceres Error 03.03.10 02:35:43 04.03.10 21:06:48 31.03 705.9 / 0.0 nf185_ 00033_ 5-- Uranus Error 12.03.10 01:22:10 13.03.10 23:49:13 30.49 681.8 / 0.0 ne998_ 00018_ 18-- Ceres Error 09.03.10 09:26:57 11.03.10 11:47:13 30.09 695.0 / 0.0 nf023_ 00025_ 11-- Pluto Error 09.03.10 16:50:42 11.03.10 12:41:16 29.80 695.0 / 0.0 ne867_ 00077_ 13-- Pluto Error 07.03.10 08:13:40 09.03.10 03:58:41 29.49 770.0 / 0.0 ne870_ 00038_ 14-- Ceres Error 07.03.10 10:27:16 09.03.10 03:58:16 29.26 658.5 / 0.0 ne684_ 00019_ 7-- Saturn Error 04.03.10 13:15:55 06.03.10 01:44:05 28.86 723.5 / 0.0 nf225_ 00030_ 10-- Ceres Error 12.03.10 17:22:58 14.03.10 10:36:41 28.57 641.8 / 0.0 ne956_ 00041_ 1-- Mercury Error 08.03.10 18:35:22 10.03.10 09:11:37 28.20 700.6 / 0.0 ne762_ 00044_ 1-- Saturn Error 05.03.10 14:15:05 07.03.10 11:24:50 26.73 670.0 / 0.0 nf049_ 00036_ 2-- Mars Error 10.03.10 01:28:50 10.03.10 23:17:40 5.00 120.7 / 0.0 nf086_ 00088_ 1-- Mars Error 10.03.10 13:14:57 11.03.10 11:31:40 4.32 101.3 / 0.0 ne859_ 00088_ 20-- Terra Error 07.03.10 09:20:54 08.03.10 05:26:53 3.40 79.4 / 0.0 ne845_ 00043_ 3-- Ceres Error 06.03.10 21:48:37 07.03.10 12:22:30 3.24 76.0 / 0.0 nf116_ 00029_ 4-- Mars Error 10.03.10 23:17:42 11.03.10 15:08:24 3.21 76.0 / 0.0 ne768_ 00051_ 4-- Pluto Error 05.03.10 16:41:01 06.03.10 04:47:06 2.29 56.3 / 0.0 ne963_ 00031_ 14-- Jupiter Error 08.03.10 20:47:25 09.03.10 23:52:13 2.26 55.4 / 0.0 ne851_ 00028_ 10-- Mars Error 07.03.10 00:23:02 07.03.10 13:03:27 2.10 50.6 / 0.0 nf030_ 00005_ 4-- Ceres Error 09.03.10 19:03:08 10.03.10 10:01:26 0.77 17.5 / 0.0 ![]() |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Hypernova,
----------------------------------------Suggest you look in the log detail. The ones, say up to 5 hours are all the /401 fails or an absent output file type **. Those with the 27-31 hours are probably time out loopers, when they've computed like 10x the fpops amount that was given in the task headers. I'll drop a note in the back room to see if the lord of the wrench can do something about the time part. ttyl ** was collecting the different messages for errors on my own and all the wingmen errors, than lost it. There were like 6 of which 3 at least surely are device issues such as "too many exits" and a time exceed.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
![]() |