Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 23
|
![]() |
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello!
My machine finished 10 ACH units by now (one still in progress) and I noticed a lot of invalid results by others which caused a lot of redistributions of work units. It seems to be the minority when a task is finished and validated with only 10 WUs in a quorum. Most need 20 or more for where I had a work unit of. Now I got an invalid result too (ach1_20_82). The only difference I found in my processing of the work unit is, that in this case I had to reboot my machine for other reasons and the WU was restarted from checkpoint. Could it be so simple that restarting from checkpoint produces an invalid result? ---> the high number of invalid results I'm using BOINC 5.10.30 on Windows XP pro. Greetings Thorsten |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
There is a thread on the topic of failing ACAH jobs. Had invalid for clean runs, invalid for single restarts, valids for multiple restarts. The few that ran on Vista never moaned and went thru without hick-up, just have lots of page faulting on XP/P4 combo. At any rate it's on the alert list.
----------------------------------------cheers
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
Protego
Cruncher Joined: Apr 26, 2007 Post Count: 33 Status: Offline |
Hi!
----------------------------------------I have run mostly on an old CAD machine, a PIII 884 MHz -- a rather slow computer. This one normally don't calculate wrong. Got a majority of the ACAH WU:s to this computer, even though I also have 2 faster ones that can take a ACAH-job. I got two ACAH in error on the PIII machine, and the results/statistics lookes like a lottery to me. Possibly there is some problem. Fun thing this with the project badges, I got my AC@H now ![]() Protego normally run FAAH WU:s. ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Now I had some more ACH workunits. None of my ACH WUs ever produced an error.
----------------------------------------For my machine I can say that those WUs which run undisturbed from start to end always produce valid results. Those WUs that are interrupted by a reboot of the machine ,and therefor restart from checkpoint, produce invalid results. This happens only with ACH. Beside the former mentioned WU I had one (ach1_22_2_4--) where I had to reboot several times during its runtime. In that quorum were 2 errors by others and mine was the only invalid ![]() During another one (ach1_22_68_8--) I had only one restart. This WU is not yet validated (Inconclusive now with 10 new distributions) but I expect it to be invalid as well. I noticed that the effects of the reboot were even visible in the graphics of the WU. ![]() Please notice the bump in the wave. This was the point when the restart occured. This is the log of that restart: 21-Feb-2008 10:13:06 [---] Starting BOINC client version 5.10.30 for windows_intelx86 21-Feb-2008 10:13:06 [---] log flags: task, file_xfer, sched_ops, checkpoint_debug 21-Feb-2008 10:13:06 [---] Libraries: libcurl/7.17.1 OpenSSL/0.9.8e zlib/1.2.3 21-Feb-2008 10:13:06 [---] Executing as a daemon 21-Feb-2008 10:13:06 [---] Data directory: C:\Program Files\BOINC 21-Feb-2008 10:13:07 [---] Processor: 1 GenuineIntel Genuine Intel(R) CPU T1300 @ 1.66GHz [x86 Family 6 Model 14 Stepping 8] 21-Feb-2008 10:13:07 [---] Processor features: fpu tsc sse sse2 mmx 21-Feb-2008 10:13:07 [---] OS: Microsoft Windows XP: Professional Edition, Service Pack 2, (05.01.2600.00) 21-Feb-2008 10:13:07 [---] Memory: 1.99 GB physical, 4.83 GB virtual 21-Feb-2008 10:13:07 [---] Disk: 55.89 GB total, 21.20 GB free 21-Feb-2008 10:13:07 [---] Local time is UTC +1 hours 21-Feb-2008 10:13:07 [World Community Grid] URL: http://www.worldcommunitygrid.org/; Computer ID: 328797; location: work; project prefs: work 21-Feb-2008 10:13:07 [---] General prefs: from World Community Grid (last modified 19-Feb-2008 10:44:44) 21-Feb-2008 10:13:07 [---] Host location: work 21-Feb-2008 10:13:07 [---] General prefs: using separate prefs for work 21-Feb-2008 10:13:07 [---] Preferences limit memory usage when active to 1834.52MB 21-Feb-2008 10:13:07 [---] Preferences limit memory usage when idle to 2038.36MB 21-Feb-2008 10:13:07 [---] Preferences limit disk usage to 19.80GB 21-Feb-2008 10:13:08 [---] Contacting account manager at http://bam.boincstats.com/ 21-Feb-2008 10:13:44 [World Community Grid] Restarting task ach1_22_68_8 using acah version 514 21-Feb-2008 10:13:46 [---] Account manager: BAM Host-ID: 66469 21-Feb-2008 10:13:46 [---] Account manager contact succeeded 21-Feb-2008 10:30:49 [World Community Grid] [checkpoint_debug] result ach1_22_68_8 checkpointed ...and the log of that WU: <core_client_version>5.10.30</core_client_version> <![CDATA[ <stderr_txt> Failed to get VersionInfo size: 1812 World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version INFO: No state to restore from. Starting from beginning Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/10/8::0:0:0 1 Failed to get VersionInfo size: 1812 World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version Restarting WRF Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/10/9::6:0:0 1 </stderr_txt> ]]> I hope this helps to find the reason for such behaviour. But for now I would recommend to avoid any restarts of ACH workunits. Otherwise there is a high risk to produce an invalid result (at least on my machine). [Edit 1 times, last edit by Former Member at Feb 28, 2008 12:09:58 PM] |
||
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Can you please tell us a little more about the nature of your reboots?
----------------------------------------E.g. necessary because of HW or SW problems, or simply because you had installed/upgraded a new SW. I am just asking myself "is it the reboot which is killing the AC@H WU, or is it the cause of the reboot?". Also, if you switch off your computer for going to bed, when you start it in the morning is the WU restarting nicely or does it fail the same? Cheers. Jean. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello Jean,
usually I let my machine run 24/7. But it is my work laptop so I have to take it with me when I am 'on the road'. In those cases I don't shutdown completely, instead I use the standby mode. I only reboot my machine in cases when it is absolutely necessary. In the mentioned cases I installed and updated some SW. I already thought about the possible impact of those SW updates and will verify my theory with a plain reboot when I get the next opportunity. Greetings Thorsten |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The few that ran on Vista never moaned and went thru without hick-up, just have lots of page faulting on XP/P4 combo. I got an invalid result for ACH from a Vista machine. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
There is a thread on the topic of failing ACAH jobs. [snip] Where? I'm not finding it (which doesn't say much, I've been blind before). I had to reboot a box with an unfinished ACH wu and though the unit is about to finish now, this sounds like it will not validate. Anyway I'd like to read more about it... |
||
|
darth_vader
Veteran Cruncher A galaxy far, far away... Joined: Jul 13, 2005 Post Count: 514 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
There is a thread on the topic of failing ACAH jobs. [snip] Where? I'm not finding it (which doesn't say much, I've been blind before). [snip] Anyway I'd like to read more about it... Look at the thread: ERROR: exit code 95 (0x5f) at: http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=16101 I've never had an error or invalid result except for ACH. All the more annoying is that the "computation finished" message preceded the error message. -D |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Update to my observations:
ach1_22_68_8-- has been validated now, and my result was, as expected, invalid. (12 invalid, 1 error, 14 valid = 27 replications!!!) With WU ach1_23_48_9-- I checked again to exclude other influences. After a checkpoint save I stopped the World Community Grid - BOINC Agent service, waited a few seconds and restarted the service. No other actions between stopping and starting the service. When I checked the graphics a few moments later, this "bump" in the wave was visible: ![]() Now I wait for the validation of this WU. But I expect an invalid result. To emphasize it again: - There are no problems with other projects after restart from checkpoint. - All ACH units that ran undisturbed from start to end were valid - I had not errors (as described in other threads) for ACH units so far - Restarts from checkpoint end up in invalid results for ACH For machine specifications see log provided in my earlier post in this thread. |
||
|
|
![]() |