Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 59
Posts: 59   Pages: 6   [ Previous Page | 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 6039 times and has 58 replies Next Thread
branjo
Master Cruncher
Slovakia
Joined: Jun 29, 2012
Post Count: 1892
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test starting Nov 5, 2013 v7.22 [ Issues Thread ]

Running full load of 8 on Win7 SP1 64b i7-3770 7.2.26 (7 CEP2 WU's left in memory, MS Security Essentials) + 4 3 on Mac OS X 10.9. (Mavericks) i5-2500S 7.0.65 (ESET Cybersecurity), no problem so far - checkpointing every ca. 10-12 mins, suspended/resumed w/o issue (LAIM on).

ETA1: one of 4 on Mac is 7.21 resend


Got 24 total (22 binaries + 2 resends), all proceeded w/o problems.
----------------------------------------

Crunching@Home since January 13 2000. Shrubbing@Home since January 5 2006

[Nov 6, 2013 6:41:28 AM]   Link   Report threatening or abusive post: please login first  Go to top 
mvoicu
Cruncher
Romania
Joined: Feb 27, 2013
Post Count: 12
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test starting Nov 5, 2013 v7.22 [ Issues Thread ]

I received one of the new WOs (BETA_ BETA_ 9999988_ 0232b_ 0) on one of my machines running Windows 7 x64 SP1 with i3, but it went from 0 to 100% extreamly fast (only 2 minutes).

Now it is pending validation, and I don't expect this to be valid work.



Result Log

Result Name: BETA_ BETA_ 9999988_ 0232b_ 0--



<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
Commandline = projects/www.worldcommunitygrid.org/wcgrid_beta17_7.22_windows_x86_64 -SettingsFile BETA_9999988_0232b.txt -DatabaseFile dataset-GDS2771-v1.txt
Initializing
wcg_learn_limit = 100000
Running
Result.out = 10479077.000000
Run complete, CPU time: 125.222003
08:37:29 (7780): called boinc_finish

</stderr_txt>
]]>
----------------------------------------

[Nov 6, 2013 6:53:04 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test starting Nov 5, 2013 v7.22 [ Issues Thread ]

OK, have had a look around the farm.
* All machines are Win7-x64 Pro EXCEPT for 1 x 2600K with XP-x64, BOINC 7.0.64 x64, service install, LAIM ON.
* I am logged in continuously on all machines.
* All machines are overclocked but have run for many months without any errors related to that.
* [Edit] The 2600K with XP-x64 has a flaky CPU that gives occasional nasty errors if fed CEP2 WUs - last one required a Windows repair-re-install. This machine does not now run CEP2, but all others are always running CEP2 WUs. [/Edit]
* All machines have mechanical hard drives that are in IDE mode, not AHCI.
* All v7.22 WUs have returned Error, with "too many exit(0)s", except for [Edit] 1 WU that ran OK and is PV on the 2600K (XP-x64 and no CEP2 running!)[/Edit], plus the 2 WUs I still have suspended on the '970 as per my post above.
The 2 v7.22 WUs on the 970 were also looping with exit(0)s and restarts when I suspended them.
I will hold them suspended a bit longer in case someone has a suggestion of something to try with them.
Otherwise I will resume them, and expect they will error out when they reach the limit on the number of exit(0)s allowed.

I don't think these exit(0)s are a startup problem like the ones that can happen when too many troublesome WUs like CEP2 ones are started together.
These exit(0)s happen when only 1 is started, I have not noticed the system freeze, and there are no "no heartbeat" messages.

Now, looking around my Results Status pages has shown a few things that could be of use to the techs:

1. The result logs of my Error WUs have the following group of lines that repeats every 10 sec until the limit on the number of exit(0)s is reached:

> Commandline = projects/www.worldcommunitygrid.org/wcgrid_beta17_7.22_windows_x86_64 -SettingsFile BETA_9999988_0347b.txt -DatabaseFile dataset-GDS2771-v1.txt
> Initializing
> wcg_learn_limit = 100000
> Running
> INFO: WcgLearnLimit(100000) reached. 0.0015534841631961 0.0015534841631961
> 11:49:07 (2924): BOINC client no longer exists - exiting
> 11:49:07 (2924): timer handler: client dead, exiting

Relevant info:
* It's a 64-bit program
* WcgLearnLimit is reached very quickly
* Claims that the BOINC clients are dead are completely false. They are all still very much alive and kicking!

2. In many cases, wingmen have completed the WU successfully and their copies are PV or have validated.
But there are variations:
Example 1: BETA_BETA_9999988_0347b (Mine has too many exit(0)s, the 2 other wingmen have validated)
Wingmen _0's log file (_2's is similar):
> 0000) reached. 0.0022961953632290 0.0022961953632290
> INFO: WcgLearnLimit(100000) reached. 0.0015725667553852 0.0015725667553852
> INFO: WcgLearnLimit(100000) reached. 0.0204744401353310 0.0204744401353310

... then many similar lines down to ...
> INFO: WcgLearnLimit(100000) reached. 0.0025786576999280 0.0025786576999280
> Result.out = 41760.000000
> Run complete, CPU time: 5070.079300
> 19:15:39 (10084): called boinc_finish


No records of their commandlines. Maybe the first parts of their logfiles have been truncated for display on the website.

Example 2: BETA_BETA_9999987_0122b (Mine has too many exit(0)s, the 2 other wingmen have validated)
Other wingmen's log files are similar to:
> <core_client_version>7.0.28</core_client_version> ((the other wingman has 6.10.58))
> <![CDATA[
> <stderr_txt>
> Commandline = projects/www.worldcommunitygrid.org/wcgrid_beta17_7.22_windows_x86_64 -SettingsFile BETA_9999987_0122b.txt -DatabaseFile dataset-GDS2771-v1.txt
> Initializing
> wcg_learn_limit = 250000
> Running
> [22:33:41]: Computing pass 0
> [23:28:52]: Computing pass 1
> Result.out = 10036995.000000
> Run complete, CPU time: 5668.545937
> 00:08:39 (532208): called boinc_finish


So all 3 of us were running the 64-bit Windows beta program.
Earlier, I found some apparently successful runs where the wingmen ran 32-bit programs and thought it might be a 64-bit-only problem. Not so.
No info on wingmen's service- vs user-installs.

[Edit] NEW INFORMATION - See "[Edit]"s above.
The WU that ran OK on the flaky 2600K was BETA_BETA_9999986_0879b.
It was a rather short WU, the last lines of its log file being
> [11:27:35]: Computing pass 472
> Result.out = 9062541.000000
> Run complete, CPU time: 819.265625

Maybe the problem relates to running CEP2 WUs at the same time as these betas ... Comment, anybody??

[Update]: No CEP2 WUs are now running on the 970 (1 is suspended), so I resumed 1 of the 2 suspended v7.22 betas. It still exits and restarts, so I don't think CEP2 is the problem.
I think it's Windows 7, while XP-x64 is OK.
[/Edit]
----------------------------------------
[Edit 5 times, last edit by Rickjb at Nov 6, 2013 9:18:39 AM]
[Nov 6, 2013 6:58:42 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test starting Nov 5, 2013 v7.22 [ Issues Thread ]

For those of you experiencing the too many exits issue can you tell us how you have your software installed? Is it a service installation?

Also - does the problem occur immediately or only after you logout and and then back in? Have you noticed any patterns like that?

The W7-32 that runs latest test client 7.2.27 [release candidate], happened to be reinstalled as user, to test conhost related memory usages. No problem.

The W7-64 runs in service and displayed the permanent "If.... zero status" cycles. Noting that the 7.19 and 7.21 tests it did not incur this 'phenomena'. It's instantaneous for every new BETA received/started. Could not vouch for logged in or out [Is it relevant when installed as service?]. Remoting into this device with TeamViewer on and off.

Both Windows devices run Avast AV and MS Defender.

Something entirely different, and just a cosmic quirk. BOINC was restarted on the Linux box after the repo send a new build to update. There was still 1 Beta 7.22 suffix _0 on there [original] with 10 hours down at 99.7% efficiency. On restart, the task dropped from 97% progress to 0.500%. Stayed there for 5 minutes [with raised heartbeat and perspiration up], then jumped back up to 97% complete. Waited there to let it finish before posting this, reported, and sits in PV waiting on wingman. The log does -not- show a restart cycle, but a monster it is with 106K passes. As was observed previously, it only stores the last few thousand passes. Maybe, because of the large number of passes, the restart cycle just rolled off?

Result Name: BETA_ BETA_ 9999979_ 0448b_ 0--
<core_client_version>7.2.22</core_client_version>
<![CDATA[
<stderr_txt>
pass 103987
....
[08:54:30]: Computing pass 105878
[08:54:30]: Computing pass 105879
[08:54:31]: Computing pass 105880
[08:54:31]: Computing pass 105881
Result.out = 2957655.000000
Run complete, CPU time: 37462.036167
08:54:32 (14232): called boinc_finish

</stderr_txt>
]]>
[Nov 6, 2013 8:09:06 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test starting Nov 5, 2013 v7.22 [ Issues Thread ]

Same problem as Sekerob on my W7-64 bit, continuous restarting until they errored out. The Linux systems are all doing just fine.

If not mistaken[I have a historical misremembering on this item **], the [BOINC] limit is 100 restarts before the plug is pulled on a task, ending in the very message of "too many exits...".

** So I checked, and yes it is 100 times, to quote from another project "I think climateprediction fell victim to the "100x Task exited with zero status but no 'finished' file" bug"

@nanoprobe, if you're reading, you install IIRC as user to enable the GPU part of the client, which else would not work on Windows.

edit: errata
----------------------------------------
[Edit 1 times, last edit by Former Member at Nov 6, 2013 8:17:42 AM]
[Nov 6, 2013 8:16:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test starting Nov 5, 2013 v7.22 [ Issues Thread ]

Whoops, sorry - just remembered that NOT all of my machines are running Win7-x64 - the flaky 2600K has XP-x64, and it's the only one that ran a v7.22 WU OK.
I was too busy examining the error WU's earlier to notice & remember.
Have re-edited my last post above.

So I think the problem relates to a difference between XP-x64 and W7-x64.
Happy bug-hunting.
[Nov 6, 2013 9:27:13 AM]   Link   Report threatening or abusive post: please login first  Go to top 
CandymanWCG
Senior Cruncher
Romania
Joined: Dec 20, 2010
Post Count: 421
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test starting Nov 5, 2013 v7.22 [ Issues Thread ]

Although I still have some ongoing WUs plus some in the PV state, all seem to run as they should. I didn't get a chance to test them a lot though, just some suspending a WU or put the laptop to sleep (LAIM on) and there were no apparent problems there. I did spot a few "if this happens repeatedly..." messages in one of the logs, but those WUs eventually completed successfully and were uploaded (maybe even validated already). Either way, looks like a much better run than the last one.

Hope this helps. Cheers!
----------------------------------------
Knowledge is limited. Imagination encircles the world! - Albert Einstein



[Nov 6, 2013 10:13:29 AM]   Link   Report threatening or abusive post: please login first  Go to top 
rbotterb
Senior Cruncher
United States
Joined: Jul 21, 2005
Post Count: 401
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test starting Nov 5, 2013 v7.22 [ Issues Thread ]

Just checked my Beta 7.22 WUs which were running through the night. 3 additional ones have completed OK. Each took 5-6 hours to run (versus 13 hours original estimates). One more Beta 7.22 WU is still running, but seems to be in progress OK and it should be completed in about another 4 hours. No restarts in this batch so far.

Only had one that ran fast (about 0.55 hours), but I noticed my wingman had a similar fast turnaround on it and both tested valid.

This batch of Beta WUs look much better than the 7.21 run..... biggrin
[Nov 6, 2013 10:21:06 AM]   Link   Report threatening or abusive post: please login first  Go to top 
ca05065
Senior Cruncher
Joined: Dec 4, 2007
Post Count: 325
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test starting Nov 5, 2013 v7.22 [ Issues Thread ]

Version 7.22 seem to be running OK.
BETA_ BETA_ 9999982_ 0617b_ 1-- restarted correctly after an overnight shutdown. The stderr on the results status page is incomplete as the last recorded 'computer pass' message is at 07:47 but the work unit completed at 10:09.
I also had the impression that on two other work units that the elapsed and CPU times decreased when the progress moved from the static 0.5%. On the fourth work unit I missed the move from 0.5% after about 40 minutes.
[Nov 6, 2013 10:27:45 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Thyme Lawn
Cruncher
Joined: Dec 9, 2008
Post Count: 46
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test starting Nov 5, 2013 v7.22 [ Issues Thread ]

After rebooting my Q6600 XP BOINC 6.12.43 (service install) system my v7.22 task initially appeared to have restarted. Progress went from around 83% with an estimated 36 minutes runtime to completion back to 0.5% with more than 16 hours left.

Checkpoint prior to the reboot was:
06-Nov-2013 09:41:20 [World Community Grid] [checkpoint] result E216781_273_J.40.C31H15N3O2S3Se.00075474.4.set1d06_0 checkpointed
The slot stderr.txt has the following around the reboot:
[09:36:00]: Computing pass 103
[09:40:20]: Computing pass 104
[09:43:18]: Computing pass 105
Commandline = projects/www.worldcommunitygrid.org/wcgrid_beta17_7.22_windows_intelx86 -SettingsFile BETA_9999979_0238b.txt -DatabaseFile dataset-GDS2771-v1.txt
Initializing
wcg_learn_limit = 1000000
Running
[10:09:10]: Computing pass 0
[10:11:11]: Computing pass 103
[10:15:03]: Computing pass 104
[10:17:36]: Computing pass 105
Given the time of the checkpoint I'm surprised that pass 103 had to be repeated.

Edit: the task was successfully completed (last pass 114) and is pending validation.
----------------------------------------
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
----------------------------------------
[Edit 4 times, last edit by Thyme Lawn at Nov 6, 2013 10:49:36 AM]
[Nov 6, 2013 10:29:19 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 59   Pages: 6   [ Previous Page | 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread