Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 12
Posts: 12   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2144 times and has 11 replies Next Thread
AgrFan
Senior Cruncher
USA
Joined: Apr 17, 2008
Post Count: 376
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
OET1_ 0000333_ xMBGP-OM_ rig_ 11309_ 0 not checkpointing

I had this unit restart from the beginning after running 6.5 hours during the latest Windows update cycle.

Any ideas why this unit did not checkpoint during the first 6.5 hours? I would have expected 2-3 checkpoints before the first reboot. It took 9.5 hours to complete after the second reboot.

Result Name: OET1_ 0000333_ xMBGP-OM_ rig_ 11309_ 0--

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[13:46:39] Number of tasks = 1 <=== START
[13:46:39] Running task 0,CPU time at start of task 0 was 0.000000
[13:46:39] ./ZINC84394807_1.pdbqt size = 38 10 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0
[20:10:33] Number of tasks = 1 <=== FIRST REBOOT
[20:10:33] Running task 0,CPU time at start of task 0 was 0.000000
[20:10:33] ./ZINC84394807_1.pdbqt size = 38 10 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0
[20:24:57] Number of tasks = 1 <=== SECOND REBOOT
[20:24:57] Running task 0,CPU time at start of task 0 was 0.000000
[20:24:57] ./ZINC84394807_1.pdbqt size = 38 10 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0
[06:06:41] Finished task #0 cpu time used 55020.131110 <=== FINISH
06:06:41 (2052): called boinc_finish(0)

</stderr_txt>
]]>
----------------------------------------
[Edit 1 times, last edit by AgrFan at Feb 13, 2015 3:20:12 AM]
[Feb 13, 2015 3:19:33 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: OET1_ 0000333_ xMBGP-OM_ rig_ 11309_ 0 not checkpointing

some strange behavior in the latest Wus. Mine have not checkpointed since the last restart. Current WUs are at almost 12 hours. Not sure about checkpoints before last restart but seven hours without one seems like something is wrong.

2/12/2015 1:47:44 PM | | Starting BOINC client version 7.4.36 for windows_intelx86
2/12/2015 1:47:44 PM | | log flags: file_xfer, sched_ops, task
2/12/2015 1:47:44 PM | | Libraries: libcurl/7.39.0 OpenSSL/1.0.1j zlib/1.2.8
2/12/2015 1:47:44 PM | | Running as a daemon
2/12/2015 1:47:44 PM | | Data directory: C:\ProgramData\BOINC
2/12/2015 1:47:44 PM | | Running under account boinc_master
2/12/2015 1:47:44 PM | | No usable GPUs found
2/12/2015 1:47:44 PM | | Host name: plum
2/12/2015 1:47:44 PM | | Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz [Family 6 Model 23 Stepping 7]
2/12/2015 1:47:44 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 nx lm vmx smx tm2 pbe
2/12/2015 1:47:44 PM | | OS: Microsoft Windows 7: Home Premium x86 Edition, Service Pack 1, (06.01.7601.00)
2/12/2015 1:47:44 PM | | Memory: 3.24 GB physical, 6.48 GB virtual
2/12/2015 1:47:44 PM | | Disk: 455.71 GB total, 178.74 GB free
2/12/2015 1:47:44 PM | | Local time is UTC -8 hours
2/12/2015 1:47:44 PM | | VirtualBox version: 4.3.12
2/12/2015 1:47:44 PM | | Config: don't compute while SDMain.exe is running
2/12/2015 1:47:44 PM | World Community Grid | URL http://www.worldcommunitygrid.org/; Computer ID 1831796; resource share 900
2/12/2015 1:47:44 PM | World Community Grid | General prefs: from World Community Grid (last modified 12-May-2014 01:16:38)
2/12/2015 1:47:44 PM | World Community Grid | Host location: none
2/12/2015 1:47:44 PM | World Community Grid | General prefs: using your defaults
2/12/2015 1:47:44 PM | | Reading preferences override file
2/12/2015 1:47:44 PM | | Preferences:
2/12/2015 1:47:44 PM | | max memory usage when active: 3151.32MB
2/12/2015 1:47:44 PM | | max memory usage when idle: 3317.18MB
2/12/2015 1:47:44 PM | | max disk usage: 177.14GB
2/12/2015 1:47:44 PM | | (to change preferences, visit a project web site or select Preferences in the Manager)
2/12/2015 1:47:44 PM | | Not using a proxy
2/12/2015 8:53:49 PM | World Community Grid | update requested by user
2/12/2015 8:53:51 PM | World Community Grid | Sending scheduler request: Requested by user.
2/12/2015 8:53:51 PM | World Community Grid | Not requesting tasks: don't need (job cache full)
2/12/2015 8:53:53 PM | World Community Grid | Scheduler request completed
2/12/2015 9:00:45 PM | | Re-reading cc_config.xml
2/12/2015 9:00:45 PM | | Not using a proxy
2/12/2015 9:00:45 PM | | Config: don't compute while SDMain.exe is running
2/12/2015 9:00:45 PM | | log flags: file_xfer, sched_ops, task, checkpoint_debug
[Feb 13, 2015 5:09:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: OET1_ 0000333_ xMBGP-OM_ rig_ 11309_ 0 not checkpointing

AgrFan & Thomas,
What evidence are you using to state that the unit did not checkpoint? Your post doesn't show either way. Unfortunately, the log entry "CPU time at start of task 0 was 0.000000" is the same whether a checkpoint has occurred or not. This batch of workunits is indeed long-running.
[Feb 13, 2015 8:08:39 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: OET1_ 0000333_ xMBGP-OM_ rig_ 11309_ 0 not checkpointing

There's several ways to know if and when a task in general has checkpointed and for OET specifically:

1) Select a task and hit the 'Properties' button of the BOINC Manager tasks view. Will tell last checkpoint time and how much time has passed since.
2) Set the <checkpoint_debug> log flag, so it prints a line in the event log.
3) For OET, progress exact at each 12.5% intervals (12.5, 25, 37.5, 50, 62.5, 75, 87.5 and 100). Time per checkpoint is not necessarily linear, in fact it can vary quite a bit on different sciences.

The OET app and BOINC Manager in this case seem to be quite good at forecasting time remaining once the task has reached the first checkpoint. It's rather unpredictable till then. Have had 4 minute results have had 23 hour results. Given my client setting of "Write to disk at most" is 5 minutes, any checkpoint generated in a shorter interval is skipped and the next up after 5 minutes is written, so the event log records sometimes fewer checkpoints than 8.

For more discussion on checkpointing visit the Start Here forum.
----------------------------------------
[Edit 2 times, last edit by Former Member at Feb 13, 2015 9:35:08 AM]
[Feb 13, 2015 9:26:38 AM]   Link   Report threatening or abusive post: please login first  Go to top 
AgrFan
Senior Cruncher
USA
Joined: Apr 17, 2008
Post Count: 376
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OET1_ 0000333_ xMBGP-OM_ rig_ 11309_ 0 not checkpointing

I rebooted the client twice after the initial 6.5 hours. Both times it started from the beginning. It took 9.5 hours to complete after the second restart. If checkpointing is happening at 12.5% intervals (or somewhere close to that) then it should have restarted at least halfway through and ran for 4-5 hours after the second reboot. I caught this the following day after the unit uploaded. Maybe MBGP units have a checkpoint bug?
----------------------------------------
[Edit 1 times, last edit by AgrFan at Feb 13, 2015 11:52:54 AM]
[Feb 13, 2015 11:52:12 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: OET1_ 0000333_ xMBGP-OM_ rig_ 11309_ 0 not checkpointing

Maybe. Got one 333 MBGP-OM_1506_5 (sixth copy) sitting on tablet at 100% since the 23rd hour, now on 26:07. Sometimes from reaching 100% to finishing it could take a little, but 3 hours at 100% is new. On restart it regressed to 99.99%, suggesting 8th checkpoint was taken and the wrap-up is not wrapping up. confused
[Feb 13, 2015 12:31:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Thargor
Veteran Cruncher
UK
Joined: Feb 3, 2012
Post Count: 1291
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OET1_ 0000333_ xMBGP-OM_ rig_ 11309_ 0 not checkpointing

I've got one (OET1_0000333_xMBGP-OM_rig_8678) that's currently just hit 13 hours run-time and at 90% progress with 1.5 hours estimated remaining - it's been there at 90% and with 1.5 hours remaining for at least the last 2 hours, unfortunately I didn't check earlier this morning to see what it was on then.

Given the posts about checkpointing here, I'm not going to abort it just yet, but if it stays there much into this evening, I guess I'll have no choice...
----------------------------------------

[Feb 13, 2015 4:44:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: OET1_ 0000333_ xMBGP-OM_ rig_ 11309_ 0 not checkpointing

Hang in there... the below is a copy of a 333 now at 34 hours, 1:44 hours after 7th checkpoint, still on the 4770K

7.19 oet1 OET1_0000333_xMBGP-OM_rig_8180_1 01d,10:28:04 (01d,08:51:37) 95,34 92,200 02:54:57 07d,23:24:55 2/11/2015 5:21:30 PM [7] 01:43:59 Running 31.32 MB 44.96 MB

The excitement starts when hitting the 99.99% mark, then concluding for return. A work unit copy of same batch on tablet is now in the 31st hour with 7 hours at 100%, after a restart which dropped it back to 99.99%. Decided to let it run until whenever it hits the magic 'exceeded maximum runtime', which is 40 fold the regular average.

Memory is not the issue... in all of the run never exceeded 45MB.
[Feb 13, 2015 5:03:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: OET1_ 0000333_ xMBGP-OM_ rig_ 11309_ 0 not checkpointing

Phew, with 'hung in there', after 8 hours of 100%, the task on Android concluded without outright error:

OET1_ 0000333_ xMBGP-OM_ rig_ 1506_ 5-- android_2654196
a Pending Validation 2/11/15 15:35:05 2/13/15 17:42:55 30.04 / 31.63 315.3 / 0.0

The log does not read as pleasant... lots of restarts. Now the wingman copy _4 needs to keep it running else it becomes the 5th error, a take-out ''Too Late'' lurking in the rafters.

Result Log

Result Name: OET1_ 0000333_ xMBGP-OM_ rig_ 1506_ 5--
<core_client_version>7.4.41</core_client_version>
<![CDATA[
<stderr_txt>
stackdumps unavailable
INFO: No state to restore. Start from the beginning.
[18:14:59] Number of tasks = 1
[18:14:59] Running task 0,CPU time at start of task 0 was 0.000000
[18:14:59] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0
stackdumps unavailable
INFO: No state to restore. Start from the beginning.
[00:42:00] Number of tasks = 1
[00:42:00] Running task 0,CPU time at start of task 0 was 0.000000
[00:42:00] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0
stackdumps unavailable
stackdumps unavailable
INFO: No state to restore. Start from the beginning.
[01:09:46] Number of tasks = 1
[01:09:46] Running task 0,CPU time at start of task 0 was 0.000000
[01:09:46] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0
stackdumps unavailable
INFO: No state to restore. Start from the beginning.
[01:16:59] Number of tasks = 1
[01:16:59] Running task 0,CPU time at start of task 0 was 0.000000
[01:16:59] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0
stackdumps unavailable
[08:40:00] Number of tasks = 1
[08:40:00] Running task 0,CPU time at start of task 0 was 0.000000
[08:40:00] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0
stackdumps unavailable
[08:49:37] Number of tasks = 1
[08:49:37] Running task 0,CPU time at start of task 0 was 0.000000
[08:49:37] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0
stackdumps unavailable
[08:49:40] Number of tasks = 1
[08:49:40] Running task 0,CPU time at start of task 0 was 0.000000
[08:49:40] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0
stackdumps unavailable
[08:49:43] Number of tasks = 1
[08:49:43] Running task 0,CPU time at start of task 0 was 0.000000
[08:49:43] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0
stackdumps unavailable
[08:49:46] Number of tasks = 1
[08:49:46] Running task 0,CPU time at start of task 0 was 0.000000
[08:49:46] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0
stackdumps unavailable
[09:02:54] Number of tasks = 1
[09:02:54] Running task 0,CPU time at start of task 0 was 0.000000
[09:02:54] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0
stackdumps unavailable
[10:25:16] Number of tasks = 1
[10:25:16] Running task 0,CPU time at start of task 0 was 0.000000
[10:25:16] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0
stackdumps unavailable
[23:03:45] Number of tasks = 1
[23:03:45] Running task 0,CPU time at start of task 0 was 0.000000
[23:03:45] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0
stackdumps unavailable
[23:47:13] Number of tasks = 1
[23:47:13] Running task 0,CPU time at start of task 0 was 0.000000
[23:47:13] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0
stackdumps unavailable
[23:47:17] Number of tasks = 1
[23:47:17] Running task 0,CPU time at start of task 0 was 0.000000
[23:47:17] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0
stackdumps unavailable
[08:39:46] Number of tasks = 1
[08:39:46] Running task 0,CPU time at start of task 0 was 0.000000
[08:39:46] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0
stackdumps unavailable
[09:33:10] Number of tasks = 1
[09:33:10] Running task 0,CPU time at start of task 0 was 0.000000
[09:33:10] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0
stackdumps unavailable
[10:13:03] Number of tasks = 1
[10:13:03] Running task 0,CPU time at start of task 0 was 0.000000
[10:13:03] ./ZINC00895034_2.pdbqt size = 10 4 ../../projects/www.worldcommunitygrid.org/oet1.xMBGP-OM_rig.pdbqt size = 1930 0
[18:42:37] Finished task #0 cpu time used 108136.850000
18:42:37 (18379): called boinc_finish(0)

</stderr_txt>
]]>

It's as if there's no LAIM being used with BfA 7.4.41? Thought that was default on the smartphone implementation?
[Feb 13, 2015 6:58:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
numbermaniac
Cruncher
Australia
Joined: Mar 28, 2014
Post Count: 46
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OET1_ 0000333_ xMBGP-OM_ rig_ 11309_ 0 not checkpointing

If this is the case, that's a shame. I'm a bit surprised that I'm almost expected to keep my phone running BOINC for >10 hours at once to let it complete, normally with FAAH even one or two hours every so often suffices.
[Feb 14, 2015 6:59:09 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 12   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread