Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 21
|
![]() |
Author |
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Please post all issues with this beta test: https://secure.worldcommunitygrid.org/forums/wcg/viewthread_thread,37395
Thanks, -Uplinger |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
To test the checkpointing issue, do you want us to provoke checkpoint-restores, e.g. by suspending a Beta task with LAIM off?
|
||
|
PMH_UK
Veteran Cruncher UK Joined: Apr 26, 2007 Post Count: 771 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Please server abort below WU from previous beta still stuck on a PC I can not reach currently, also others from same set.
----------------------------------------BETA_ betaugm1_ ugm1_ 00011_ 0328_ 2-- Edit: Tried again and was able to remote in and abort. Bad news was it had re-booted, probably lost power, so only 44 hours. Paul.
Paul.
----------------------------------------[Edit 1 times, last edit by PMH_UK at Oct 31, 2014 9:25:19 PM] |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1321 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
. . 2. Shared memory issues Thanks, -Uplinger Fixed. Did the shared graphics memory test by starting several tasks simultaneously by 2 BOINC clients on the same host. No crashes or errors. Suspending with LAIM off and resuming and also restarting of BOINC works fine. We've to wait and see, whether these tasks or other tasks with backwards checkpointing will be valid. CP [Edit 1 times, last edit by Crystal Pellet at Oct 30, 2014 10:29:56 PM] |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
PMH, I will see what I can do.
-Uplinger |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Yes, please test checkpointing by stopping and starting the workunit more than once.
Thanks, -Uplinger |
||
|
Yarensc
Advanced Cruncher USA Joined: Sep 24, 2011 Post Count: 136 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Suspended my one with LAIM off and it seems fine. The estimated run time is about double what this computer normally does, but that might just be random chance.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Suspended my one with LAIM off and it seems fine. The critical test will be whether it ends up Valid or Invalid after comparison with wingman/wingmen. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Yes, please test checkpointing by stopping and starting the workunit more than once. OK, all my Beta tasks have been hammered with several suspend/resume cycles, LAIM off. Just be warned that LAIM off will affect all other running tasks, so any previously-running UGM1 v7.22 tasks may end up Invalid, because that's the checkpoint issue that led to this Beta. I think we can't totally avoid that risk, but suspending only one Beta at a time may constrain the multiple checkpoint restores to only one v7.22 task. |
||
|
PMH_UK
Veteran Cruncher UK Joined: Apr 26, 2007 Post Count: 771 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have done multiple re-boots/stop/starts on these, appear to have resumed from checkpoint OK.
----------------------------------------BETA_ ugm1_ ugm1_ 00478_ 0629_ 1-- USB3-A Valid 30/10/14 21:56:46 31/10/14 04:14:46 6.06 / 6.11 110.4 / 129.4 BETA_ ugm1_ ugm1_ 00478_ 0648_ 0-- USB3-A Pending Validation 30/10/14 21:56:46 31/10/14 04:17:56 6.11 / 6.16 111.4 / 0.0 BETA_ ugm1_ ugm1_ 00478_ 0722_ 0-- 8NUC Valid 30/10/14 21:55:33 31/10/14 05:42:08 7.68 / 7.71 171.7 / 157.3 BETA_ ugm1_ ugm1_ 00478_ 0083_ 0-- 8NUC Valid 30/10/14 21:55:33 31/10/14 05:37:07 7.59 / 7.63 169.8 / 152.3 BETA_ ugm1_ ugm1_ 00313_ 0467_ 0-- 8NUC Valid 30/10/14 21:53:23 31/10/14 05:34:37 7.56 / 7.59 169.0 / 126.2 BETA_ ugm1_ ugm1_ 00313_ 0286_ 1-- 8NUC Pending Validation 30/10/14 21:53:23 31/10/14 05:39:37 7.64 / 7.67 170.8 / 0.0 Edit: Updated list now returned and looking good so far... Paul.
Paul.
----------------------------------------[Edit 1 times, last edit by PMH_UK at Oct 31, 2014 2:01:36 PM] |
||
|
|
![]() |