Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 90
|
![]() |
Author |
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7675 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Got a couple of these. Both errored along with all the others. It was interesting to see the same unit go to both Windows and Linux, not that it made any difference in the outcome.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I had one that reports Too Late; which doesn't make sense. Looking at the WU, 0-3 all errored out. I almost got 16 hours of computation time into it before it thought it was Too Late.
Project Name: Beta - The Clean Energy Project - Phase 2 Created: 03/17/2016 00:49:24 Name: BETA_E236441_534_S.482.C52H18N8S6.KCGJROWFHRSJHJ-UHFFFAOYSA-N.2_s1_14 Minimum Quorum: 2 Replication: 2 <core_client_version>7.6.22</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [06:50:58] Number of jobs = 5 [06:50:58] Starting job 0,CPU time has been restored to 0.000000. [19:48:22] Finished Job #0 [19:48:22] Starting job 1,CPU time has been restored to 43371.437500. [21:03:47] Finished Job #1 [21:03:47] Starting job 2,CPU time has been restored to 47807.406250. [22:00:14] Finished Job #2 [22:00:14] Starting job 3,CPU time has been restored to 51019.734375. Application exited with RC = 0x1 [23:16:03] Finished Job #3 [23:16:03] Starting job 4,CPU time has been restored to 55460.468750. [23:16:03] Skipping Job #4 23:16:12 (8964): called boinc_finish </stderr_txt> ]]> Result Name OS type OS version App Version Number Status Sent Time Time Due / Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit BETA_ E236441_ 534_ S.482.C52H18N8S6.KCGJROWFHRSJHJ-UHFFFAOYSA-N.2_ s1_ 14_ 4-- Microsoft Windows Server 2003 "R2" Enterprise Server x86 Edition, Service Pack 2, (05.02.3790.00) 700 Too Late 3/19/16 11:50:20 3/20/16 04:18:20 15.41 375.5 / 0.0 BETA_ E236441_ 534_ S.482.C52H18N8S6.KCGJROWFHRSJHJ-UHFFFAOYSA-N.2_ s1_ 14_ 3-- Microsoft Windows 7 x64 Edition, Service Pack 1, (06.01.7601.00) 700 Error 3/18/16 16:50:44 3/19/16 05:47:39 8.09 215.8 / 0.0 BETA_ E236441_ 534_ S.482.C52H18N8S6.KCGJROWFHRSJHJ-UHFFFAOYSA-N.2_ s1_ 14_ 2-- Microsoft Windows 7 Professional x64 Edition, Service Pack 1, (06.01.7601.00) 700 Error 3/18/16 16:50:43 3/19/16 00:30:04 7.09 226.7 / 0.0 BETA_ E236441_ 534_ S.482.C52H18N8S6.KCGJROWFHRSJHJ-UHFFFAOYSA-N.2_ s1_ 14_ 1-- Microsoft Windows 8.1 Professional x64 Edition, (06.03.9600.00) 700 Error 3/17/16 01:05:41 3/17/16 09:56:55 4.75 134.6 / 0.0 BETA_ E236441_ 534_ S.482.C52H18N8S6.KCGJROWFHRSJHJ-UHFFFAOYSA-N.2_ s1_ 14_ 0-- Microsoft Windows 7 Ultimate x64 Edition, Service Pack 1, (06.01.7601.00) 700 Error 3/17/16 01:04:37 3/18/16 16:50:36 9.39 232.5 / 0.0 I don't see why a WU that I received at 11:50 is too late not even a day later. I only had the WU for almost 16.5 hours. CEP2 WU's can run long, so I didn't even hit the 16-hour CPU time limit. |
||
|
PecosRiverM
Veteran Cruncher The Great State of Texas Joined: Apr 27, 2007 Post Count: 1054 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It seems that if you have 4 errors the 5th WU goes to "Too Late".
----------------------------------------I think this is because there is no 6th WU to verify against. But what do I know. ![]() ![]() ![]() ![]() |
||
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
IMHO it looks like check pointing is still an issue. Checking tasks that I received I have not found 1 that check pointed until at least 7 hours in. I'm still checking on resends but so far I have 2 that are 10 and 13.5 hours in without a single check point. And now these long or no check point issues cause another issue. Had a power failure last night that was longer than my UPS could cover. As a result about 8 tasks started over when the power came back on. Since they are due today there is no way that they will finish before the deadline. Many of them were 7-10 plus hours in before the failure.
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
![]() ![]() |
||
|
danogian
Cruncher United Kingdom Joined: Apr 27, 2007 Post Count: 36 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I've just had to abort BETA_E236438_841_S.400.C47H25N3O5S2.SRRCWTQQAVCZJT-UHFFFAOYSA-N.12_s1_14_1 as it aborted and restarted itself from 0% at least 6 times, usually after running for about 5 hours with no checkpoint. One time it also took the whole BOINC service with it (BOINC 7.2.47 on Windows 7 Ultimate x64).
----------------------------------------Also, I am currently running BETA_E236440_399_S.448.C45F2H14N6O5S4.DSBOADLGQJABJM-UHFFFAOYSA-N.13_s1_14_1 which at the moment is 55% after 10 hours CPU, and has just checkpointed for the first time about 20 minutes ago.
Core i9-10900K (Comet Lake) - 10C/20T
Core i7-2600 (Sandy Bridge) - 4C/8T Ryzen 7 Pro 3700U (Zen+ Picasso) - 4C/8T Core i5-3320M (Ivy Bridge) - 2C/4T Core2 Duo E8400 (Wolfdale) - 2C/2T ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
For everyone mystified by a Too Late status, please read the FAQ entitled Results Status page - What does xyz status mean?, in particular the brief description of the alternative meaning that is nothing to do with it being too late.
|
||
|
[SG-FC]Hammerburg
Senior Cruncher Joined: Apr 26, 2007 Post Count: 251 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Error:
----------------------------------------Result Name: BETA_ E236441_ 367_ S.484.C53H20N6O1S6.MULQIFJOYQQCJB-UHFFFAOYSA-N.6_ s1_ 14_ 3-- <core_client_version>7.6.9</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [22:32:57] Number of jobs = 5 [22:32:57] Starting job 0,CPU time has been restored to 0.000000. [02:05:49] Finished Job #0 [02:05:49] Starting job 1,CPU time has been restored to 12488.129652. [02:23:59] Finished Job #1 [02:23:59] Starting job 2,CPU time has been restored to 13565.300956. [02:41:03] Finished Job #2 [02:41:03] Starting job 3,CPU time has been restored to 14570.072197. Application exited with RC = 0x1 [03:01:39] Finished Job #3 [03:01:39] Starting job 4,CPU time has been restored to 15791.903230. [03:01:39] Skipping Job #4 03:01:43 (5428): called boinc_finish </stderr_txt> ]]> ![]() |
||
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Some very strange goings-on with these betas ...
BETA_E236441_567_S.486.C52H22N8S6.YYKMRHKMMJCQNU-UHFFFAOYSA-N.16_s1_14 4 copies sent out so far. All returned. I got copy _2. All exited with RC = 0x100. Copy _0 quit early in Job 0, after Qink name = scfman. Copies _1, _2 and _3 all quit in job3, after Qink name = fldman. Status values for the 4 WUs are currently P Ver, Error, Error, P Val, respectively. So copies _1, _2 and _3 have all quit after the same Qink marker, but have different results status values. Explanation? And yes, I confirm the very infrequent checkpointing of these WUs. I had one that ran for about 17h total and checkpointed only once, well into the 16th hour. Plus others that didn't checkpoint at all before they finished. Members are reminded that if they want to power off their machines without losing the the crunching time spent after the startup or last checkpoints of WUs, hibernating the O/S before powering off, if it works, will preserve the non-checkpointed run time. I'm expecting another round or several of betas for this CEP2 update before it hits production ... |
||
|
yoro42
Ace Cruncher United States Joined: Feb 19, 2011 Post Count: 8979 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
BETA_ E236441_ 60_ S.482.C54H18N4O2S6.XEMJZKHTWWDGER-UHFFFAOYSA-N.3_ s1_ 14_ 0-- Microsoft Windows 8.1 Core x64 Edition, (06.03.9600.00) - No Reply 3/17/16 01:05:38 3/21/16 01:05:38 0.00 0.0 / 0.0
----------------------------------------Computer: Simone_ Project World Community Grid Name BETA_E236441_60_S.482.C54H18N4O2S6.XEMJZKHTWWDGER-UHFFFAOYSA-N.3_s1_14_0 Application beta11 7.00 Workunit name BETA_E236441_60_S.482.C54H18N4O2S6.XEMJZKHTWWDGER-UHFFFAOYSA-N.3_s1_14 State Running High P. Received 3/16/2016 6:05:34 PM Report deadline 3/20/2016 6:05:38 PM Estimated app speed 2.03 GFLOPs/sec Estimated task size 86,121 GFLOPs CPU time at last checkpoint 00:00:00 CPU time 08:26:29 Elapsed time 08:38:03 Estimated time remaining 04:35:35 Fraction done 46.897% Virtual memory size 304.83 MB Working set size 147.64 MB Directory slots/7 Process ID 2996 I ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
For everyone mystified by a Too Late status, please read the FAQ entitled Results Status page - What does xyz status mean?, in particular the brief description of the alternative meaning that is nothing to do with it being too late. But the alternative meaning doesn't fit either. Too Late - The result was returned to the server a longer time after it was due. Occasionally a result previously marked Pending Validation has the distribution stopped due to too many errors, without a complete quorum [max errors varies per science]. The non-error results are then converted to the status Too Late. Credit is granted as claimed [with delay]. Internally these task results are moved to a take-out list, for later review. Also see the Pending Validation status. _0 through _3 all were errors. _0 was returned at 3/18/16 16:50:36 _1 was returned at 3/17/16 09:56:55 _2 was returned at 3/19/16 00:30:04 _3 was returned at 3/19/16 05:47:39 _4 was sent at 3/19/16 11:50:20 So I was given that WU over 12 hours after the last (_2) was returned back. Unless these were not validated until long after it was sent to me; which just wasted my CPU time and received no credit for the nearly 16 hours of processing time that went into it. |
||
|
|
![]() |