Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 41
|
![]() |
Author |
|
ca05065
Senior Cruncher Joined: Dec 4, 2007 Post Count: 328 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I am also seeing the same behaviour as Crystal Pellet on 6 work units on windows.
two with overnight shutdown: 11-Mar-2016 00:48:02 [World Community Grid] [cpu_sched] Preempting BETA_AC0002_T000_F00093_S00001n_1 (removed from memory) 11-Mar-2016 00:48:02 [World Community Grid] [cpu_sched] Preempting BETA_AC0002_T000_F00031_S00001b_1 (removed from memory) 11-Mar-2016 00:48:02 [---] Suspending network activity - requested by operating system . . 11-Mar-2016 07:23:49 [World Community Grid] [cpu_sched] Restarting task BETA_AC0002_T000_F00093_S00001n_1 using beta22 version 712 in slot 8 11-Mar-2016 07:23:49 [World Community Grid] [cpu_sched] Restarting task BETA_AC0002_T000_F00031_S00001b_1 using beta22 version 712 in slot 9 Four with suspend resume: 11-Mar-2016 08:18:28 [World Community Grid] [cpu_sched] Preempting BETA_AC0002_T000_F00030_S00001r_1 (removed from memory) 11-Mar-2016 08:18:28 [World Community Grid] [cpu_sched] Preempting BETA_AC0002_T000_F00029_S00001l_1 (removed from memory) 11-Mar-2016 08:18:28 [World Community Grid] [cpu_sched] Preempting BETA_AC0002_T000_F00055_S00001e_0 (removed from memory) 11-Mar-2016 08:18:28 [World Community Grid] [cpu_sched] Preempting BETA_AC0002_T000_F00054_S00001k_1 (removed from memory) . . . 11-Mar-2016 08:18:48 [World Community Grid] task BETA_AC0002_T000_F00030_S00001r_1 resumed by user 11-Mar-2016 08:18:48 [World Community Grid] task BETA_AC0002_T000_F00055_S00001e_0 resumed by user 11-Mar-2016 08:18:48 [World Community Grid] task BETA_AC0002_T000_F00054_S00001k_1 resumed by user 11-Mar-2016 08:18:48 [World Community Grid] task BETA_AC0002_T000_F00029_S00001l_1 resumed by user |
||
|
Falconet
Master Cruncher Portugal Joined: Mar 9, 2009 Post Count: 3295 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I turned LAIM off and suspended my 2 BETA tasks:
----------------------------------------Progress went from 17.X% to 15.X% and is currently at 20% and checkpointing just happened. Seems fine on Linux 64-bit. AMD Ryzen 5 1600AF 6C/12T 3.2 GHz - 85W AMD Ryzen 5 2500U 4C/8T 2.0 GHz - 28W AMD Ryzen 7 7730U 8C/16T 3.0 GHz |
||
|
ca05065
Senior Cruncher Joined: Dec 4, 2007 Post Count: 328 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I decided to abort the 6 work units mentioned above.
----------------------------------------BoincTasks showed the status as 'user aborted' and started new tasks. Process explorer showed BOINC running the aborted tasks as well as the newly started tasks. I tried to stop the BOINC service which did eventually happen. BOINC process disappeared from Process Explorer but left the science tasks still running. I started the BOINC service but this did not correct the situation so I had to re-boot the PC. The stderr from Results Status only shows: Result Name: BETA_ AC0002_ T000_ F00055_ S00001e_ 0-- <core_client_version>7.6.22</core_client_version> <![CDATA[ <message> aborted by user </message> ]]> The useful information from stderr on the PC has been lost. [Edit 1 times, last edit by ca05065 at Mar 11, 2016 9:59:56 AM] |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Ah, orphan/zombie, "hello I'm still/not running" check failing.
----------------------------------------You can kill orphaned processes as admin [999 out of 1000 cases]... no boot needed. (This test was starting while watching a movie projection on the inside of the eyelids, full 3D, one about REM... no betas came through :| ) [Edit 2 times, last edit by SekeRob* at Mar 11, 2016 9:52:14 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The Result Log for these units is truncated to give only the final 20 minutes or so of log data, e.g.
Result Name: BETA_ AC0002_ T000_ F00054_ S00001d_ 1-- <core_client_version>7.2.47</core_client_version> <![CDATA[ <stderr_txt> NFO: Completed step 3981000 of initial simulation [09:15:52] INFO: Completed step 3982000 of initial simulation [09:15:53] INFO: Completed step 3983000 of initial simulation [09:15:54] INFO: Completed step 3984000 of initial simulation ... (log snipped) [09:35:24] INFO: Completed step 4998000 of initial simulation [09:35:25] INFO: Completed step 4999000 of initial simulation [09:35:27] INFO: Completed step 5000000 of initial simulation [09:35:27] INFO: Finished initial simulation. [09:35:27] INFO: Running secondary simulation [09:35:28] INFO: Run complete, CPU time: 5572.402520 09:35:28 (8264): called boinc_finish(0) </stderr_txt> Note the "NFO:" line, where the truncation occurred mid-word in this case. Is writing log data every second a bit excessive, or is it really needed? |
||
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I got these Betas only on Linux-x64 VMs and they have all run happily so far.
----------------------------------------As reported by others, the intial values of estimated run-time to completion were all way too high. The progress indicators were fairly accurate, eg on a 3770K @ 4.3GHz, progress increased steadily at about 1%/min and the CPU times in the Results Status pages were about 1.6 - 1.7 hrs. I tested checkpointing on only 1 machine, and everything appeared to work as it should. I did a few tests just suspending & resuming the WUs with LAIM off, and also shut down Linux and power-cycled the machine. All good. HTH [Edit 1 times, last edit by Rickjb at Mar 12, 2016 5:42:06 AM] |
||
|
Falconet
Master Cruncher Portugal Joined: Mar 9, 2009 Post Count: 3295 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I turned LAIM off and suspended my 2 BETA tasks: Progress went from 17.X% to 15.X% and is currently at 20% and checkpointing just happened. Seems fine on Linux 64-bit. They just finished. 2.09 and 2.07 hours. No problems whatsoever. One of them has turned valid. AMD Ryzen 5 1600AF 6C/12T 3.2 GHz - 85W AMD Ryzen 5 2500U 4C/8T 2.0 GHz - 28W AMD Ryzen 7 7730U 8C/16T 3.0 GHz [Edit 1 times, last edit by Falconet at Mar 11, 2016 11:09:25 AM] |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1323 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Progress is increasing until the task makes a new checkpoint at 7,8%. Last 2 lines in stderr.txt: The other BETA's were processed in about 3 hours and finished.[08:43:21] INFO: Completed step 390000 of initial simulation Writing checkpoint at step 390151. and afterwards nothing at all. Process is running using a full core, but no new checkpoint are made and progress stays the same. The restarted one is still running after 4.5 hours and it looks like it will never end. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
12 units completed, all took between 1.6 and 1.9 hours (Windows 7 & 10). 8 are already Valid, the rest PVal.
|
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2172 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The restarted one is still running after 4.5 hours and it looks like it will never end. Do you see any activity in the designated "slots/" directory (files updated, timestamp updates), Crystal Pellet? [Edit 1 times, last edit by adriverhoef at Mar 11, 2016 2:26:14 PM] |
||
|
|
![]() |