Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 23
Posts: 23   Pages: 3   [ 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 4994 times and has 22 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Invalid results

Hello!

My machine finished 10 ACH units by now (one still in progress) and I noticed a lot of invalid results by others which caused a lot of redistributions of work units.
It seems to be the minority when a task is finished and validated with only 10 WUs in a quorum. Most need 20 or more for where I had a work unit of.

Now I got an invalid result too (ach1_20_82). The only difference I found in my processing of the work unit is, that in this case I had to reboot my machine for other reasons and the WU was restarted from checkpoint.

Could it be so simple that restarting from checkpoint produces an invalid result? ---> the high number of invalid results

I'm using BOINC 5.10.30 on Windows XP pro.

Greetings

Thorsten
[Feb 16, 2008 12:48:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid results

There is a thread on the topic of failing ACAH jobs. Had invalid for clean runs, invalid for single restarts, valids for multiple restarts. The few that ran on Vista never moaned and went thru without hick-up, just have lots of page faulting on XP/P4 combo. At any rate it's on the alert list.

cheers
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Feb 16, 2008 2:00:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Protego
Cruncher
Joined: Apr 26, 2007
Post Count: 33
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid results

Hi!
I have run mostly on an old CAD machine, a PIII 884 MHz -- a rather slow computer. This one normally don't
calculate wrong. Got a majority of the ACAH WU:s to this computer, even though I also have 2 faster
ones that can take a ACAH-job. I got two ACAH in error on the PIII machine, and the results/statistics lookes like a
lottery to me. Possibly there is some problem.

Fun thing this with the project badges, I got my AC@H now biggrin!
Protego normally run FAAH WU:s.
----------------------------------------

[Feb 16, 2008 10:55:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid results

Now I had some more ACH workunits. None of my ACH WUs ever produced an error.

For my machine I can say that those WUs which run undisturbed from start to end always produce valid results. Those WUs that are interrupted by a reboot of the machine ,and therefor restart from checkpoint, produce invalid results. This happens only with ACH.

Beside the former mentioned WU I had one (ach1_22_2_4--) where I had to reboot several times during its runtime. In that quorum were 2 errors by others and mine was the only invalid crying

During another one (ach1_22_68_8--) I had only one restart. This WU is not yet validated (Inconclusive now with 10 new distributions) but I expect it to be invalid as well. I noticed that the effects of the reboot were even visible in the graphics of the WU.

Please notice the bump in the wave. This was the point when the restart occured.

This is the log of that restart:
21-Feb-2008 10:13:06 [---] Starting BOINC client version 5.10.30 for windows_intelx86
21-Feb-2008 10:13:06 [---] log flags: task, file_xfer, sched_ops, checkpoint_debug
21-Feb-2008 10:13:06 [---] Libraries: libcurl/7.17.1 OpenSSL/0.9.8e zlib/1.2.3
21-Feb-2008 10:13:06 [---] Executing as a daemon
21-Feb-2008 10:13:06 [---] Data directory: C:\Program Files\BOINC
21-Feb-2008 10:13:07 [---] Processor: 1 GenuineIntel Genuine Intel(R) CPU T1300 @ 1.66GHz [x86 Family 6 Model 14 Stepping 8]
21-Feb-2008 10:13:07 [---] Processor features: fpu tsc sse sse2 mmx
21-Feb-2008 10:13:07 [---] OS: Microsoft Windows XP: Professional Edition, Service Pack 2, (05.01.2600.00)
21-Feb-2008 10:13:07 [---] Memory: 1.99 GB physical, 4.83 GB virtual
21-Feb-2008 10:13:07 [---] Disk: 55.89 GB total, 21.20 GB free
21-Feb-2008 10:13:07 [---] Local time is UTC +1 hours
21-Feb-2008 10:13:07 [World Community Grid] URL: http://www.worldcommunitygrid.org/; Computer ID: 328797; location: work; project prefs: work
21-Feb-2008 10:13:07 [---] General prefs: from World Community Grid (last modified 19-Feb-2008 10:44:44)
21-Feb-2008 10:13:07 [---] Host location: work
21-Feb-2008 10:13:07 [---] General prefs: using separate prefs for work
21-Feb-2008 10:13:07 [---] Preferences limit memory usage when active to 1834.52MB
21-Feb-2008 10:13:07 [---] Preferences limit memory usage when idle to 2038.36MB
21-Feb-2008 10:13:07 [---] Preferences limit disk usage to 19.80GB
21-Feb-2008 10:13:08 [---] Contacting account manager at http://bam.boincstats.com/
21-Feb-2008 10:13:44 [World Community Grid] Restarting task ach1_22_68_8 using acah version 514
21-Feb-2008 10:13:46 [---] Account manager: BAM Host-ID: 66469
21-Feb-2008 10:13:46 [---] Account manager contact succeeded
21-Feb-2008 10:30:49 [World Community Grid] [checkpoint_debug] result ach1_22_68_8 checkpointed


...and the log of that WU:
<core_client_version>5.10.30</core_client_version>
<![CDATA[
<stderr_txt>
Failed to get VersionInfo size: 1812
World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version
INFO: No state to restore from. Starting from beginning
Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/10/8::0:0:0 1
Failed to get VersionInfo size: 1812
World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version
Restarting WRF
Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/10/9::6:0:0 1

</stderr_txt>
]]>


I hope this helps to find the reason for such behaviour. But for now I would recommend to avoid any restarts of ACH workunits. Otherwise there is a high risk to produce an invalid result (at least on my machine).
----------------------------------------
[Edit 1 times, last edit by Former Member at Feb 28, 2008 12:09:58 PM]
[Feb 28, 2008 12:08:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3715
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid results

Can you please tell us a little more about the nature of your reboots?
E.g. necessary because of HW or SW problems, or simply because you had installed/upgraded a new SW.

I am just asking myself "is it the reboot which is killing the AC@H WU, or is it the cause of the reboot?".

Also, if you switch off your computer for going to bed, when you start it in the morning is the WU restarting nicely or does it fail the same?

Cheers. Jean.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[Feb 29, 2008 5:35:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid results

Hello Jean,

usually I let my machine run 24/7. But it is my work laptop so I have to take it with me when I am 'on the road'. In those cases I don't shutdown completely, instead I use the standby mode.

I only reboot my machine in cases when it is absolutely necessary. In the mentioned cases I installed and updated some SW. I already thought about the possible impact of those SW updates and will verify my theory with a plain reboot when I get the next opportunity.

Greetings

Thorsten
[Feb 29, 2008 7:31:16 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid results

The few that ran on Vista never moaned and went thru without hick-up, just have lots of page faulting on XP/P4 combo.

I got an invalid result for ACH from a Vista machine.
[Feb 29, 2008 11:36:58 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid results

There is a thread on the topic of failing ACAH jobs. [snip]


Where? I'm not finding it (which doesn't say much, I've been blind before).
I had to reboot a box with an unfinished ACH wu and though the unit is about to finish now, this sounds like it will not validate. Anyway I'd like to read more about it...
[Mar 1, 2008 3:37:04 AM]   Link   Report threatening or abusive post: please login first  Go to top 
darth_vader
Veteran Cruncher
A galaxy far, far away...
Joined: Jul 13, 2005
Post Count: 514
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid results

There is a thread on the topic of failing ACAH jobs. [snip]


Where? I'm not finding it (which doesn't say much, I've been blind before).
[snip] Anyway I'd like to read more about it...


Look at the thread: ERROR: exit code 95 (0x5f)

at: http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=16101

I've never had an error or invalid result except for ACH. All the more annoying is that the "computation finished" message preceded the error message.

-D
[Mar 3, 2008 12:35:47 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Invalid results

Update to my observations:

ach1_22_68_8-- has been validated now, and my result was, as expected, invalid. (12 invalid, 1 error, 14 valid = 27 replications!!!)

With WU ach1_23_48_9-- I checked again to exclude other influences.
After a checkpoint save I stopped the World Community Grid - BOINC Agent service, waited a few seconds and restarted the service. No other actions between stopping and starting the service.

When I checked the graphics a few moments later, this "bump" in the wave was visible:


Now I wait for the validation of this WU. But I expect an invalid result.
To emphasize it again:
- There are no problems with other projects after restart from checkpoint.
- All ACH units that ran undisturbed from start to end were valid
- I had not errors (as described in other threads) for ACH units so far
- Restarts from checkpoint end up in invalid results for ACH

For machine specifications see log provided in my earlier post in this thread.
[Mar 3, 2008 9:45:09 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 23   Pages: 3   [ 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread