Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 90
Posts: 90   Pages: 9   [ Previous Page | 1 2 3 4 5 6 7 8 9 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 9548 times and has 89 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta March 15, 2016 [Issues Thread]

And some units are ending up really badly. All 5 copies exited in Job #3 with RC = 0x1 or 0x100 but all resulted in Error (or Too Late).

BETA_ E236439_ 34_ S.430.C54H26S5.PPZCAYPFBDNNFH-UHFFFAOYSA-N.15_ s1_ 14_ 4-- Microsoft Windows 10 x64 Edition, (10.00.10586.00) 700 Too Late 18/03/16 01:58:24 18/03/16 10:38:19 8.31 263.8 / 0.0
BETA_ E236439_ 34_ S.430.C54H26S5.PPZCAYPFBDNNFH-UHFFFAOYSA-N.15_ s1_ 14_ 2-- Linux 3.13.0-32-generic 700 Error 17/03/16 10:22:48 18/03/16 01:58:16 14.90 251.5 / 0.0
BETA_ E236439_ 34_ S.430.C54H26S5.PPZCAYPFBDNNFH-UHFFFAOYSA-N.15_ s1_ 14_ 3-- Linux 4.5.0-rc5 700 Error 17/03/16 10:22:42 18/03/16 00:28:11 13.45 268.0 / 0.0
BETA_ E236439_ 34_ S.430.C54H26S5.PPZCAYPFBDNNFH-UHFFFAOYSA-N.15_ s1_ 14_ 1-- Microsoft x64 Edition, (10.00.10586.00) 700 Error 16/03/16 23:08:10 17/03/16 10:09:58 9.16 266.0 / 0.0
BETA_ E236439_ 34_ S.430.C54H26S5.PPZCAYPFBDNNFH-UHFFFAOYSA-N.15_ s1_ 14_ 0-- Microsoft Windows 8.1 Enterprise x64 Edition, (06.03.9600.00) 700 Error 16/03/16 23:06:30 17/03/16 10:22:33 5.31 148.9 / 0.0
[Mar 18, 2016 11:57:47 AM]   Link   Report threatening or abusive post: please login first  Go to top 
minus56bits
Cruncher
Joined: Jan 14, 2016
Post Count: 3
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta March 15, 2016 [Issues Thread]

BETA_E236440_732_S.454.C52H20N6O2S4.UVQUMKKFZKZIPZ-UHFFFAOYS was looping just below 1% (that is about 11 minutes on my machine) for more than 2 hours. Estimated completion time was 50:26 hours.

STDERR.TXT showed several messages like "missing heartbeat". Unfortunately I can't post the file as I killed it somehow while playing around and after that the task went to "Computing Error". :-( Sorry.

I think the error message is misleading as 7 UGM tasks were running (and are still running) in parallel without any issues.

Client is version 7.2.47 on Windows 10 Pro on INTEL i7-6700K
---------
Frank
----------------------------------------
[Edit 1 times, last edit by minus56bits at Mar 18, 2016 12:27:04 PM]
[Mar 18, 2016 12:26:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta March 15, 2016 [Issues Thread]

Has your system done CEP2 before? [This is the same app in Beta as used in production, just tasks with different configurations]. Reason I'm asking is, the missing heartbeat error hints at performance issues [too heavy to let the task keep in touch with the client, to tell it's still running... 30 seconds interruption will reset the task or kill it].

Edit: To add, the application goes into a setup phase which can last quite a while, and is so heavy on the disk I/O that actual computation does not start until that is finished. Occasionally I see 5-7 minutes pass depending on overall system load, where only Elapsed is clocking time, then when calculations starts the CPU time counter also begins ticking. This is not easy to monitor in the official BOINC Manager, but if you select a starting CEP2 task and hit the properties button on left [only in the BOINC Manager advanced view], you'll see lots more task progress details.
----------------------------------------
[Edit 2 times, last edit by SekeRob* at Mar 18, 2016 1:05:09 PM]
[Mar 18, 2016 12:49:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1672
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta March 15, 2016 [Issues Thread]

So happens to have asked for a feature that makes CEP2 never show a TTC higher than the cap of 18 hours. Your 1.5 days deadline is plenty time with the knowledge we have but the client currently can't be made to wise up on, at least AFAIK.

Finally WU BETA_E236439_388_S.420.C44F2H16N6S5.SIQVRHDHDDANTL-UHFFFAOYSA-N.10_s1_14 has been sent back this morning after less than 6 computation hours but already "too late" sad

Yves
----------------------------------------
[Mar 18, 2016 1:04:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta March 15, 2016 [Issues Thread]

Are you sure it is a 'too late' too late, or just marked 'too late' as escape clause because 5 copies could not get a quorum together?
[Mar 18, 2016 1:07:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1672
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta March 15, 2016 [Issues Thread]

I am the only one marked "too late", the 4 other wingmen are stated as "error".
----------------------------------------
[Mar 18, 2016 1:12:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta March 15, 2016 [Issues Thread]

@knreed, can't you find a different, additional moniker for these non-quorum converging [but not failed] results? To reuse an old status which was abolished for Pending Verification because of confusion, call these "Final inconclusive" or maybe "Unverifiable" :O).
[Mar 18, 2016 1:12:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta March 15, 2016 [Issues Thread]

I am the only one marked "too late", the 4 other wingmen are stated as "error".

Exactly what I said, your 5th could not find a wingman because they all errored out, so the 5th gets the misleading 'Too Late' [Think it's explained in the Community maintained FAQ's]. Also see my previous post directed at knreed.
[Mar 18, 2016 1:16:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta March 15, 2016 [Issues Thread]

As for 'Community maintained FAQ's', they are -not- as having a 'watch' on that forum, yet to get a single mail there was action in there [only knreed added an OP before the rename].
[Mar 18, 2016 1:19:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta March 15, 2016 [Issues Thread]

On LINUX, Beta WUs seem to be running at higher priority than the other work units. OET units are being CPU starved. Other WUs on same machine with BETAs are running at 75% to 80% CPU utilization. Suspend the BETAs and other WUs climb back to 99% to 100%. This never happened with the previous testing


This has been resolved as a BOINC user error. blushing

Forgot that ncpus had been changed and as a result there were 24 processes running on a 16 cpu machine since BOINC had been told there were more processors. Once the number of running WUs matched the number of processors, BETA WUs ran as expected.
[Mar 18, 2016 1:53:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 90   Pages: 9   [ Previous Page | 1 2 3 4 5 6 7 8 9 | Next Page ]
[ Jump to Last Post ]
Post new Thread