Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 10
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 893 times and has 9 replies Next Thread
Dayle Diamond
Senior Cruncher
Joined: Jan 31, 2013
Post Count: 440
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
confused PC Stays on, LAIM On, Project Still Restarts

Hi folks, sorry if this may have been asked before.

One of my computers is pretty strong and stays on all the time, so I'm experimenting with contributing to CEP-2. I have it running eleven copies, with one core supporting the GPU tasks for GPU Grid.

There doesn't seem to be a bottleneck - there's plenty of hard disk space for the project, and they're using less RAM then the system requirements warned (6.5 in use out of 16 gig max).

But for the last few days, when I check on them, most are progressing steadily, on their first HOUR of work confused . Records say a handful complete, with no errors, but most tasks are the same ones I started with.

Any advice?
[Jan 1, 2015 3:53:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: PC Stays on, LAIM On, Project Still Restarts

CEP2 does a lot of I/O in bursts, so the bottleneck might be the HDD especially as you're running so many in parallel. You may notice heartbeat errors in the logs, particularly if they're all (re)starting at the same time.

My advice would be to run fewer at once. You can tweak BOINC to set a maximum. See elsewhere in this forum for how-to.
[Jan 1, 2015 5:03:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
keithhenry
Ace Cruncher
Senile old farts of the world ....uh.....uh..... nevermind
Joined: Nov 18, 2004
Post Count: 18665
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: PC Stays on, LAIM On, Project Still Restarts

CEP2 also checkpoints very infrequently so the slightest hiccup will back a WU up to the last checkpoint. That many concurrent WUs could be maxing out your memory making LAIM pointless. First thing to do would be to review the BOINC event log for unusual messages. Also, if you follow the directory tree down, you may be able to find the log for one of the active WUs (stdaeerr.txt I *think*) and see what's been written to it so far. Post what you find here.
----------------------------------------
Join/Website/IMODB



[Jan 1, 2015 9:56:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Dayle Diamond
Senior Cruncher
Joined: Jan 31, 2013
Post Count: 440
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: PC Stays on, LAIM On, Project Still Restarts

Oho. Exited with zero status but no 'finished' file. Finally, an error message!

I'd be happy to decrease the number of work units, if that's correlated with this error message.

1/1/2015 1:16:11 PM | World Community Grid | Task E227433_692_S.254.C34H22N4.QRZDCTKWRXZCIL-UHFFFAOYSA-N.7_s1_14_0 exited with zero status but no 'finished' file
1/1/2015 1:16:11 PM | World Community Grid | If this happens repeatedly you may need to reset the project.
1/1/2015 1:16:11 PM | World Community Grid | Task E227435_398_S.254.C32H20N6.KMBASDFPNPOUPR-UHFFFAOYSA-N.9_s1_14_0 exited with zero status but no 'finished' file
1/1/2015 1:16:11 PM | World Community Grid | If this happens repeatedly you may need to reset the project.
1/1/2015 1:16:11 PM | World Community Grid | Task E227449_173_S.254.C34H22N4.CQMPUMPMOALEFI-UHFFFAOYSA-N.6_s1_14_0 exited with zero status but no 'finished' file
1/1/2015 1:16:11 PM | World Community Grid | If this happens repeatedly you may need to reset the project.
1/1/2015 1:16:11 PM | World Community Grid | Task E227451_930_S.256.C32H20N4S1.DNPKIYWXPSOUDJ-UHFFFAOYSA-N.3_s1_14_0 exited with zero status but no 'finished' file
1/1/2015 1:16:11 PM | World Community Grid | If this happens repeatedly you may need to reset the project.
1/1/2015 1:16:11 PM | World Community Grid | Task E227454_357_S.256.C34H22N2S1.HEPVMXLHWJGLHY-UHFFFAOYSA-N.9_s1_14_0 exited with zero status but no 'finished' file
1/1/2015 1:16:11 PM | World Community Grid | If this happens repeatedly you may need to reset the project.
1/1/2015 1:16:11 PM | World Community Grid | Computation for task E227432_996_S.254.C28H12N6S2.KQZHOPHCFKURLS-UHFFFAOYSA-N.3_s1_14_0 finished
1/1/2015 1:16:11 PM | FiND@Home | Sending scheduler request: To fetch work.
1/1/2015 1:16:11 PM | FiND@Home | Requesting new tasks for CPU and NVIDIA GPU
1/1/2015 1:17:03 PM | World Community Grid | Task E227420_961_S.250.C36H27N1.KUHWGGPVXFMOLS-UHFFFAOYSA-N.6_s1_14_0 exited with zero status but no 'finished' file
1/1/2015 1:17:03 PM | World Community Grid | If this happens repeatedly you may need to reset the project.
1/1/2015 1:17:03 PM | World Community Grid | Task E227423_672_S.252.C26H15N7S2.RREHHLBQLUMRNP-UHFFFAOYSA-N.3_s1_14_0 exited with zero status but no 'finished' file
1/1/2015 1:17:03 PM | World Community Grid | If this happens repeatedly you may need to reset the project.
1/1/2015 1:17:03 PM | World Community Grid | Task E227423_721_S.252.C30H21N5O2.HICZTGFSAJMRCA-UHFFFAOYSA-N.16_s1_14_0 exited with zero status but no 'finished' file
1/1/2015 1:17:03 PM | World Community Grid | If this happens repeatedly you may need to reset the project.
1/1/2015 1:17:03 PM | World Community Grid | Task E227427_507_S.252.C32H24N4O1.SIOKLUZBYXREFR-UHFFFAOYSA-N.6_s1_14_0 exited with zero status but no 'finished' file
1/1/2015 1:17:03 PM | World Community Grid | If this happens repeatedly you may need to reset the project.
1/1/2015 1:17:03 PM | World Community Grid | Task E227429_654_S.252.C27H20N10.LUFRHPJDNNECAF-UHFFFAOYSA-N.17_s1_14_1 exited with zero status but no 'finished' file
1/1/2015 1:17:03 PM | World Community Grid | If this happens repeatedly you may need to reset the project.
1/1/2015 1:17:03 PM | World Community Grid | Task E227431_2_S.254.C32H20N6.IIOWKCXICUURSX-UHFFFAOYSA-N.12_s1_14_0 exited with zero status but no 'finished' file
1/1/2015 1:17:03 PM | World Community Grid | If this happens repeatedly you may need to reset the project.
1/1/2015 1:17:04 PM | World Community Grid | Started upload of E227432_996_S.254.C28H12N6S2.KQZHOPHCFKURLS-UHFFFAOYSA-N.3_s1_14_0_0
1/1/2015 1:17:04 PM | World Community Grid | Started upload of E227432_996_S.254.C28H12N6S2.KQZHOPHCFKURLS-UHFFFAOYSA-N.3_s1_14_0_1
1/1/2015 1:17:05 PM | World Community Grid | Finished upload of E227432_996_S.254.C28H12N6S2.KQZHOPHCFKURLS-UHFFFAOYSA-N.3_s1_14_0_0
1/1/2015 1:17:05 PM | World Community Grid | Started upload of E227432_996_S.254.C28H12N6S2.KQZHOPHCFKURLS-UHFFFAOYSA-N.3_s1_14_0_2
[Jan 1, 2015 10:20:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mamajuanauk
Master Cruncher
United Kingdom
Joined: Dec 15, 2012
Post Count: 1900
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: PC Stays on, LAIM On, Project Still Restarts

It's been recorded in the forum that I've been running 64 instances of CEP2 on the same machine. That said, the point is there is not a problem per se, I found the main problem was availability of RAM and starting too many instances at the same time.

Manually 'suspend' and stagger the restart of the tasks/wu's should help.

I also have my machines on UPS/battery backup to prevent any power glitches causing problems...
----------------------------------------
Mamajuanauk is the Name! Crunching is the Game!



[Jan 1, 2015 10:23:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Dayle Diamond
Senior Cruncher
Joined: Jan 31, 2013
Post Count: 440
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: PC Stays on, LAIM On, Project Still Restarts

64 instances is amazing!
Okay, my first step is manually restarting the tasks one by one.
I'll let you know how it goes.

Not sure if I can do anything about the power supply for the next few days.
[Jan 1, 2015 11:21:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mamajuanauk
Master Cruncher
United Kingdom
Joined: Dec 15, 2012
Post Count: 1900
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: PC Stays on, LAIM On, Project Still Restarts

64 instances is amazing!
Okay, my first step is manually restarting the tasks one by one.
I'll let you know how it goes.

Not sure if I can do anything about the power supply for the next few days.
I've just moved one of my machines back to CEP2, loaded it up with 64 wu's and ensured they all started at different times...

I'll post any errors...

UPS not essential, just helps if there are any power glitches.
----------------------------------------
Mamajuanauk is the Name! Crunching is the Game!



[Jan 2, 2015 7:24:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Dayle Diamond
Senior Cruncher
Joined: Jan 31, 2013
Post Count: 440
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: PC Stays on, LAIM On, Project Still Restarts

Alright, I've paid more attention to my BOINC manager then I thought possible and caught one of my work units restarting. I kept only 2 CEP2 running at a time, and found one with just 40 seconds on the runtime. Checked the log, and it hadn't been updated for a few minutes - no activity was recorded.

The second CEP2 was running with no interruption. Does that still point to a power issue?
[Jan 3, 2015 1:43:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mamajuanauk
Master Cruncher
United Kingdom
Joined: Dec 15, 2012
Post Count: 1900
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: PC Stays on, LAIM On, Project Still Restarts

Check how much memory and disk space you have allocated in Boinc Manager locally, be careful to check the actual memory available in the log, it's within the first few lines at startup.

The percentage allocation can restrict the what is actually available even though you have alocated 20Gb, if the % is restricted to 50 or the leave xx Gb then there may be less than you intend.

Check Task manager to see how much memory is being used in total.

Post some log entries from Boinc again, the first 20-30 from the start of the log and some of the latest ones...

Edit - Look out for these lines-
31/12/2014 15:38:58 |  | max memory usage when active: 29399.16MB
31/12/2014 15:38:58 | | max memory usage when idle: 31032.45MB
31/12/2014 15:38:58 | | max disk usage: 25.00GB

----------------------------------------
Mamajuanauk is the Name! Crunching is the Game!



----------------------------------------
[Edit 1 times, last edit by Mamajuanauk at Jan 3, 2015 2:20:59 PM]
[Jan 3, 2015 2:18:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Dayle Diamond
Senior Cruncher
Joined: Jan 31, 2013
Post Count: 440
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: PC Stays on, LAIM On, Project Still Restarts

My event log doesn't go back any farther than yesterday, it's full of FiND & Malaria Control projects that have been running while I slowly restart CEP2.

Settings are to use 70% memory while in use, 90% while idle. Should I increase that, and if so, to what? Again, I'm running 16GB of DDR3.

Edit: Screw it, Harvard or no, I can't justify wasting any more months of computation to this project, I'll stick with the default mix.
----------------------------------------
[Edit 1 times, last edit by Dayle Diamond at Jan 3, 2015 3:06:54 PM]
[Jan 3, 2015 2:40:30 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread