Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 41
Posts: 41   Pages: 5   [ Previous Page | 1 2 3 4 5 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 5968 times and has 40 replies Next Thread
ca05065
Senior Cruncher
Joined: Dec 4, 2007
Post Count: 328
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - March 10, 2016 [ Issues Thread ]

I am also seeing the same behaviour as Crystal Pellet on 6 work units on windows.
two with overnight shutdown:
11-Mar-2016 00:48:02 [World Community Grid] [cpu_sched] Preempting BETA_AC0002_T000_F00093_S00001n_1 (removed from memory)
11-Mar-2016 00:48:02 [World Community Grid] [cpu_sched] Preempting BETA_AC0002_T000_F00031_S00001b_1 (removed from memory)
11-Mar-2016 00:48:02 [---] Suspending network activity - requested by operating system
.
.
11-Mar-2016 07:23:49 [World Community Grid] [cpu_sched] Restarting task BETA_AC0002_T000_F00093_S00001n_1 using beta22 version 712 in slot 8
11-Mar-2016 07:23:49 [World Community Grid] [cpu_sched] Restarting task BETA_AC0002_T000_F00031_S00001b_1 using beta22 version 712 in slot 9

Four with suspend resume:
11-Mar-2016 08:18:28 [World Community Grid] [cpu_sched] Preempting BETA_AC0002_T000_F00030_S00001r_1 (removed from memory)
11-Mar-2016 08:18:28 [World Community Grid] [cpu_sched] Preempting BETA_AC0002_T000_F00029_S00001l_1 (removed from memory)
11-Mar-2016 08:18:28 [World Community Grid] [cpu_sched] Preempting BETA_AC0002_T000_F00055_S00001e_0 (removed from memory)
11-Mar-2016 08:18:28 [World Community Grid] [cpu_sched] Preempting BETA_AC0002_T000_F00054_S00001k_1 (removed from memory)
.
.
.
11-Mar-2016 08:18:48 [World Community Grid] task BETA_AC0002_T000_F00030_S00001r_1 resumed by user
11-Mar-2016 08:18:48 [World Community Grid] task BETA_AC0002_T000_F00055_S00001e_0 resumed by user
11-Mar-2016 08:18:48 [World Community Grid] task BETA_AC0002_T000_F00054_S00001k_1 resumed by user
11-Mar-2016 08:18:48 [World Community Grid] task BETA_AC0002_T000_F00029_S00001l_1 resumed by user
[Mar 11, 2016 8:43:55 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Falconet
Master Cruncher
Portugal
Joined: Mar 9, 2009
Post Count: 3295
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - March 10, 2016 [ Issues Thread ]

I turned LAIM off and suspended my 2 BETA tasks:

Progress went from 17.X% to 15.X% and is currently at 20% and checkpointing just happened.
Seems fine on Linux 64-bit.
----------------------------------------


AMD Ryzen 5 1600AF 6C/12T 3.2 GHz - 85W
AMD Ryzen 5 2500U 4C/8T 2.0 GHz - 28W
AMD Ryzen 7 7730U 8C/16T 3.0 GHz
[Mar 11, 2016 8:52:12 AM]   Link   Report threatening or abusive post: please login first  Go to top 
ca05065
Senior Cruncher
Joined: Dec 4, 2007
Post Count: 328
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - March 10, 2016 [ Issues Thread ]

I decided to abort the 6 work units mentioned above.
BoincTasks showed the status as 'user aborted' and started new tasks.
Process explorer showed BOINC running the aborted tasks as well as the newly started tasks.
I tried to stop the BOINC service which did eventually happen. BOINC process disappeared from Process Explorer but left the science tasks still running. I started the BOINC service but this did not correct the situation so I had to re-boot the PC.

The stderr from Results Status only shows:
Result Name: BETA_ AC0002_ T000_ F00055_ S00001e_ 0--
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
aborted by user
</message>
]]>

The useful information from stderr on the PC has been lost.
----------------------------------------
[Edit 1 times, last edit by ca05065 at Mar 11, 2016 9:59:56 AM]
[Mar 11, 2016 9:44:22 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test - March 10, 2016 [ Issues Thread ]

Ah, orphan/zombie, "hello I'm still/not running" check failing.

You can kill orphaned processes as admin [999 out of 1000 cases]... no boot needed.

(This test was starting while watching a movie projection on the inside of the eyelids, full 3D, one about REM... no betas came through :| )
----------------------------------------
[Edit 2 times, last edit by SekeRob* at Mar 11, 2016 9:52:14 AM]
[Mar 11, 2016 9:49:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test - March 10, 2016 [ Issues Thread ]

The Result Log for these units is truncated to give only the final 20 minutes or so of log data, e.g.

Result Name: BETA_ AC0002_ T000_ F00054_ S00001d_ 1--
<core_client_version>7.2.47</core_client_version>
<![CDATA[
<stderr_txt>
NFO: Completed step 3981000 of initial simulation
[09:15:52] INFO: Completed step 3982000 of initial simulation
[09:15:53] INFO: Completed step 3983000 of initial simulation
[09:15:54] INFO: Completed step 3984000 of initial simulation
... (log snipped)
[09:35:24] INFO: Completed step 4998000 of initial simulation
[09:35:25] INFO: Completed step 4999000 of initial simulation
[09:35:27] INFO: Completed step 5000000 of initial simulation
[09:35:27] INFO: Finished initial simulation.
[09:35:27] INFO: Running secondary simulation
[09:35:28] INFO: Run complete, CPU time: 5572.402520
09:35:28 (8264): called boinc_finish(0)
</stderr_txt>

Note the "NFO:" line, where the truncation occurred mid-word in this case.

Is writing log data every second a bit excessive, or is it really needed?
[Mar 11, 2016 10:14:44 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - March 10, 2016 [ Issues Thread ]

I got these Betas only on Linux-x64 VMs and they have all run happily so far.
As reported by others, the intial values of estimated run-time to completion were all way too high.
The progress indicators were fairly accurate, eg on a 3770K @ 4.3GHz, progress increased steadily at about 1%/min and the CPU times in the Results Status pages were about 1.6 - 1.7 hrs.

I tested checkpointing on only 1 machine, and everything appeared to work as it should. I did a few tests just suspending & resuming the WUs with LAIM off, and also shut down Linux and power-cycled the machine. All good.
HTH
----------------------------------------
[Edit 1 times, last edit by Rickjb at Mar 12, 2016 5:42:06 AM]
[Mar 11, 2016 10:25:08 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Falconet
Master Cruncher
Portugal
Joined: Mar 9, 2009
Post Count: 3295
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - March 10, 2016 [ Issues Thread ]

I turned LAIM off and suspended my 2 BETA tasks:

Progress went from 17.X% to 15.X% and is currently at 20% and checkpointing just happened.
Seems fine on Linux 64-bit.



They just finished. 2.09 and 2.07 hours.
No problems whatsoever. One of them has turned valid.
----------------------------------------


AMD Ryzen 5 1600AF 6C/12T 3.2 GHz - 85W
AMD Ryzen 5 2500U 4C/8T 2.0 GHz - 28W
AMD Ryzen 7 7730U 8C/16T 3.0 GHz
----------------------------------------
[Edit 1 times, last edit by Falconet at Mar 11, 2016 11:09:25 AM]
[Mar 11, 2016 10:46:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1323
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - March 10, 2016 [ Issues Thread ]

Progress is increasing until the task makes a new checkpoint at 7,8%. Last 2 lines in stderr.txt:
[08:43:21] INFO: Completed step 390000 of initial simulation
Writing checkpoint at step 390151.

and afterwards nothing at all.
Process is running using a full core, but no new checkpoint are made and progress stays the same.
The other BETA's were processed in about 3 hours and finished.
The restarted one is still running after 4.5 hours and it looks like it will never end.
[Mar 11, 2016 11:56:54 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test - March 10, 2016 [ Issues Thread ]

12 units completed, all took between 1.6 and 1.9 hours (Windows 7 & 10). 8 are already Valid, the rest PVal.
[Mar 11, 2016 2:05:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2172
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - March 10, 2016 [ Issues Thread ]

The restarted one is still running after 4.5 hours and it looks like it will never end.

Do you see any activity in the designated "slots/" directory (files updated, timestamp updates), Crystal Pellet?
----------------------------------------
[Edit 1 times, last edit by adriverhoef at Mar 11, 2016 2:26:14 PM]
[Mar 11, 2016 2:25:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 41   Pages: 5   [ Previous Page | 1 2 3 4 5 | Next Page ]
[ Jump to Last Post ]
Post new Thread