Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 12
|
![]() |
Author |
|
AgrFan
Senior Cruncher USA Joined: Apr 17, 2008 Post Count: 376 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I had this unit restart from the beginning after running 6.5 hours during the latest Windows update cycle.
----------------------------------------Any ideas why this unit did not checkpoint during the first 6.5 hours? I would have expected 2-3 checkpoints before the first reboot. It took 9.5 hours to complete after the second reboot. Result Name: OET1_ 0000333_ xMBGP-OM_ rig_ 11309_ 0-- <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [13:46:39] Number of tasks = 1 <=== START [13:46:39] Running task 0,CPU time at start of task 0 was 0.000000 [13:46:39] ./ZINC84394807_1.pdbqt size = 38 10 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0 [20:10:33] Number of tasks = 1 <=== FIRST REBOOT [20:10:33] Running task 0,CPU time at start of task 0 was 0.000000 [20:10:33] ./ZINC84394807_1.pdbqt size = 38 10 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0 [20:24:57] Number of tasks = 1 <=== SECOND REBOOT [20:24:57] Running task 0,CPU time at start of task 0 was 0.000000 [20:24:57] ./ZINC84394807_1.pdbqt size = 38 10 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0 [06:06:41] Finished task #0 cpu time used 55020.131110 <=== FINISH 06:06:41 (2052): called boinc_finish(0) </stderr_txt> ]]> [Edit 1 times, last edit by AgrFan at Feb 13, 2015 3:20:12 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
some strange behavior in the latest Wus. Mine have not checkpointed since the last restart. Current WUs are at almost 12 hours. Not sure about checkpoints before last restart but seven hours without one seems like something is wrong.
2/12/2015 1:47:44 PM | | Starting BOINC client version 7.4.36 for windows_intelx86 2/12/2015 1:47:44 PM | | log flags: file_xfer, sched_ops, task 2/12/2015 1:47:44 PM | | Libraries: libcurl/7.39.0 OpenSSL/1.0.1j zlib/1.2.8 2/12/2015 1:47:44 PM | | Running as a daemon 2/12/2015 1:47:44 PM | | Data directory: C:\ProgramData\BOINC 2/12/2015 1:47:44 PM | | Running under account boinc_master 2/12/2015 1:47:44 PM | | No usable GPUs found 2/12/2015 1:47:44 PM | | Host name: plum 2/12/2015 1:47:44 PM | | Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz [Family 6 Model 23 Stepping 7] 2/12/2015 1:47:44 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 nx lm vmx smx tm2 pbe 2/12/2015 1:47:44 PM | | OS: Microsoft Windows 7: Home Premium x86 Edition, Service Pack 1, (06.01.7601.00) 2/12/2015 1:47:44 PM | | Memory: 3.24 GB physical, 6.48 GB virtual 2/12/2015 1:47:44 PM | | Disk: 455.71 GB total, 178.74 GB free 2/12/2015 1:47:44 PM | | Local time is UTC -8 hours 2/12/2015 1:47:44 PM | | VirtualBox version: 4.3.12 2/12/2015 1:47:44 PM | | Config: don't compute while SDMain.exe is running 2/12/2015 1:47:44 PM | World Community Grid | URL http://www.worldcommunitygrid.org/; Computer ID 1831796; resource share 900 2/12/2015 1:47:44 PM | World Community Grid | General prefs: from World Community Grid (last modified 12-May-2014 01:16:38) 2/12/2015 1:47:44 PM | World Community Grid | Host location: none 2/12/2015 1:47:44 PM | World Community Grid | General prefs: using your defaults 2/12/2015 1:47:44 PM | | Reading preferences override file 2/12/2015 1:47:44 PM | | Preferences: 2/12/2015 1:47:44 PM | | max memory usage when active: 3151.32MB 2/12/2015 1:47:44 PM | | max memory usage when idle: 3317.18MB 2/12/2015 1:47:44 PM | | max disk usage: 177.14GB 2/12/2015 1:47:44 PM | | (to change preferences, visit a project web site or select Preferences in the Manager) 2/12/2015 1:47:44 PM | | Not using a proxy 2/12/2015 8:53:49 PM | World Community Grid | update requested by user 2/12/2015 8:53:51 PM | World Community Grid | Sending scheduler request: Requested by user. 2/12/2015 8:53:51 PM | World Community Grid | Not requesting tasks: don't need (job cache full) 2/12/2015 8:53:53 PM | World Community Grid | Scheduler request completed 2/12/2015 9:00:45 PM | | Re-reading cc_config.xml 2/12/2015 9:00:45 PM | | Not using a proxy 2/12/2015 9:00:45 PM | | Config: don't compute while SDMain.exe is running 2/12/2015 9:00:45 PM | | log flags: file_xfer, sched_ops, task, checkpoint_debug |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
AgrFan & Thomas,
What evidence are you using to state that the unit did not checkpoint? Your post doesn't show either way. Unfortunately, the log entry "CPU time at start of task 0 was 0.000000" is the same whether a checkpoint has occurred or not. This batch of workunits is indeed long-running. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
There's several ways to know if and when a task in general has checkpointed and for OET specifically:
----------------------------------------1) Select a task and hit the 'Properties' button of the BOINC Manager tasks view. Will tell last checkpoint time and how much time has passed since. 2) Set the <checkpoint_debug> log flag, so it prints a line in the event log. 3) For OET, progress exact at each 12.5% intervals (12.5, 25, 37.5, 50, 62.5, 75, 87.5 and 100). Time per checkpoint is not necessarily linear, in fact it can vary quite a bit on different sciences. The OET app and BOINC Manager in this case seem to be quite good at forecasting time remaining once the task has reached the first checkpoint. It's rather unpredictable till then. Have had 4 minute results have had 23 hour results. Given my client setting of "Write to disk at most" is 5 minutes, any checkpoint generated in a shorter interval is skipped and the next up after 5 minutes is written, so the event log records sometimes fewer checkpoints than 8. For more discussion on checkpointing visit the Start Here forum. [Edit 2 times, last edit by Former Member at Feb 13, 2015 9:35:08 AM] |
||
|
AgrFan
Senior Cruncher USA Joined: Apr 17, 2008 Post Count: 376 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I rebooted the client twice after the initial 6.5 hours. Both times it started from the beginning. It took 9.5 hours to complete after the second restart. If checkpointing is happening at 12.5% intervals (or somewhere close to that) then it should have restarted at least halfway through and ran for 4-5 hours after the second reboot. I caught this the following day after the unit uploaded. Maybe MBGP units have a checkpoint bug?
----------------------------------------[Edit 1 times, last edit by AgrFan at Feb 13, 2015 11:52:54 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Maybe. Got one 333 MBGP-OM_1506_5 (sixth copy) sitting on tablet at 100% since the 23rd hour, now on 26:07. Sometimes from reaching 100% to finishing it could take a little, but 3 hours at 100% is new. On restart it regressed to 99.99%, suggesting 8th checkpoint was taken and the wrap-up is not wrapping up.
![]() |
||
|
Thargor
Veteran Cruncher UK Joined: Feb 3, 2012 Post Count: 1291 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I've got one (OET1_0000333_xMBGP-OM_rig_8678) that's currently just hit 13 hours run-time and at 90% progress with 1.5 hours estimated remaining - it's been there at 90% and with 1.5 hours remaining for at least the last 2 hours, unfortunately I didn't check earlier this morning to see what it was on then.
----------------------------------------Given the posts about checkpointing here, I'm not going to abort it just yet, but if it stays there much into this evening, I guess I'll have no choice... ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hang in there... the below is a copy of a 333 now at 34 hours, 1:44 hours after 7th checkpoint, still on the 4770K
7.19 oet1 OET1_0000333_xMBGP-OM_rig_8180_1 01d,10:28:04 (01d,08:51:37) 95,34 92,200 02:54:57 07d,23:24:55 2/11/2015 5:21:30 PM [7] 01:43:59 Running 31.32 MB 44.96 MB The excitement starts when hitting the 99.99% mark, then concluding for return. A work unit copy of same batch on tablet is now in the 31st hour with 7 hours at 100%, after a restart which dropped it back to 99.99%. Decided to let it run until whenever it hits the magic 'exceeded maximum runtime', which is 40 fold the regular average. Memory is not the issue... in all of the run never exceeded 45MB. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Phew, with 'hung in there', after 8 hours of 100%, the task on Android concluded without outright error:
OET1_ 0000333_ xMBGP-OM_ rig_ 1506_ 5-- android_2654196 a Pending Validation 2/11/15 15:35:05 2/13/15 17:42:55 30.04 / 31.63 315.3 / 0.0 The log does not read as pleasant... lots of restarts. Now the wingman copy _4 needs to keep it running else it becomes the 5th error, a take-out ''Too Late'' lurking in the rafters. Result Log Result Name: OET1_ 0000333_ xMBGP-OM_ rig_ 1506_ 5-- <core_client_version>7.4.41</core_client_version> <![CDATA[ <stderr_txt> stackdumps unavailable INFO: No state to restore. Start from the beginning. [18:14:59] Number of tasks = 1 [18:14:59] Running task 0,CPU time at start of task 0 was 0.000000 [18:14:59] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0 stackdumps unavailable INFO: No state to restore. Start from the beginning. [00:42:00] Number of tasks = 1 [00:42:00] Running task 0,CPU time at start of task 0 was 0.000000 [00:42:00] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0 stackdumps unavailable stackdumps unavailable INFO: No state to restore. Start from the beginning. [01:09:46] Number of tasks = 1 [01:09:46] Running task 0,CPU time at start of task 0 was 0.000000 [01:09:46] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0 stackdumps unavailable INFO: No state to restore. Start from the beginning. [01:16:59] Number of tasks = 1 [01:16:59] Running task 0,CPU time at start of task 0 was 0.000000 [01:16:59] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0 stackdumps unavailable [08:40:00] Number of tasks = 1 [08:40:00] Running task 0,CPU time at start of task 0 was 0.000000 [08:40:00] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0 stackdumps unavailable [08:49:37] Number of tasks = 1 [08:49:37] Running task 0,CPU time at start of task 0 was 0.000000 [08:49:37] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0 stackdumps unavailable [08:49:40] Number of tasks = 1 [08:49:40] Running task 0,CPU time at start of task 0 was 0.000000 [08:49:40] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0 stackdumps unavailable [08:49:43] Number of tasks = 1 [08:49:43] Running task 0,CPU time at start of task 0 was 0.000000 [08:49:43] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0 stackdumps unavailable [08:49:46] Number of tasks = 1 [08:49:46] Running task 0,CPU time at start of task 0 was 0.000000 [08:49:46] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0 stackdumps unavailable [09:02:54] Number of tasks = 1 [09:02:54] Running task 0,CPU time at start of task 0 was 0.000000 [09:02:54] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0 stackdumps unavailable [10:25:16] Number of tasks = 1 [10:25:16] Running task 0,CPU time at start of task 0 was 0.000000 [10:25:16] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0 stackdumps unavailable [23:03:45] Number of tasks = 1 [23:03:45] Running task 0,CPU time at start of task 0 was 0.000000 [23:03:45] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0 stackdumps unavailable [23:47:13] Number of tasks = 1 [23:47:13] Running task 0,CPU time at start of task 0 was 0.000000 [23:47:13] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0 stackdumps unavailable [23:47:17] Number of tasks = 1 [23:47:17] Running task 0,CPU time at start of task 0 was 0.000000 [23:47:17] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0 stackdumps unavailable [08:39:46] Number of tasks = 1 [08:39:46] Running task 0,CPU time at start of task 0 was 0.000000 [08:39:46] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0 stackdumps unavailable [09:33:10] Number of tasks = 1 [09:33:10] Running task 0,CPU time at start of task 0 was 0.000000 [09:33:10] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0 stackdumps unavailable [10:13:03] Number of tasks = 1 [10:13:03] Running task 0,CPU time at start of task 0 was 0.000000 [10:13:03] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0 [18:42:37] Finished task #0 cpu time used 108136.850000 18:42:37 (18379): called boinc_finish(0) </stderr_txt> ]]> It's as if there's no LAIM being used with BfA 7.4.41? Thought that was default on the smartphone implementation? |
||
|
numbermaniac
Cruncher Australia Joined: Mar 28, 2014 Post Count: 46 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
If this is the case, that's a shame. I'm a bit surprised that I'm almost expected to keep my phone running BOINC for >10 hours at once to let it complete, normally with FAAH even one or two hours every so often suffices.
|
||
|
|
![]() |