Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 38
Posts: 38   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3755 times and has 37 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: "400 slots directories" work unit error

Just seeing your ramdisk info addition, is the disk being back-upped in real time in intervals? Is this process locking pieces of the ram memory, causing bits to not get updated when they need to? Ask the developers at the alpha mail list would be my next step.

On ramdisks, saw this a little while ago about dynamic ramdisk sizing, freeing up memory if storage needs are small, growing when there's demand: http://betanews.com/2014/01/26/imdisk-toolkit-adds-dynamic-ram-disks/
[Mar 27, 2014 2:52:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mgruben
Advanced Cruncher
Joined: May 26, 2013
Post Count: 94
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: "400 slots directories" work unit error

Just seeing your ramdisk info addition, is the disk being back-upped in real time in intervals? Is this process locking pieces of the ram memory, causing bits to not get updated when they need to? Ask the developers at the alpha mail list would be my next step.
The RAMdisk is not backed up to the SSD (personal choice; I know I risk losing days of work in a power outage). I do transfer the /slots directory to the SSD prior to system restarts and reload that data to the RAMdisk prior to restarting the boinc client.

I am unsure about lockage.

If the odd behavior you mention is actually present, I am unsure why it would only surface now, when the machine has been a dedicated CEP2 cruncher (until recently) since August 2013.
----------------------------------------

[Mar 27, 2014 3:00:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mgruben
Advanced Cruncher
Joined: May 26, 2013
Post Count: 94
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: "400 slots directories" work unit error

On ramdisks, saw this a little while ago about dynamic ramdisk sizing, freeing up memory if storage needs are small, growing when there's demand: http://betanews.com/2014/01/26/imdisk-toolkit-adds-dynamic-ram-disks/
The program you link to appears to mimic the behavior of linux' tmpfs and ramfs file systems (see also) (tmpfs has a hard limit [in my case 7.7GB] while ramfs doesn't really obey limits, but neither will hog those 7.7GB unless actually using them)
----------------------------------------

[Mar 27, 2014 3:11:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: "400 slots directories" work unit error

Pass, ran out of obvious ideas such as backtracking what may have changed at time of the trouble surfacing and doing thorough hardware diagnostics
[Mar 27, 2014 3:15:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mgruben
Advanced Cruncher
Joined: May 26, 2013
Post Count: 94
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: "400 slots directories" work unit error

Pass, ran out of obvious ideas such as backtracking what may have changed at time of the trouble surfacing and doing thorough hardware diagnostics

Haha not a problem; thanks for your persistence at trying to work through my problems for me
----------------------------------------

[Mar 27, 2014 6:00:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
BobCat13
Senior Cruncher
Joined: Oct 29, 2005
Post Count: 295
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: "400 slots directories" work unit error

Mgruben,

The next time you see this, could you run the following command from the /slots/ directory:

echo */ | wc

I just ran it on my Linux Mint install that has slots 0-6 (7 total) and the output looked like this:

1 7 21

1=/slots/ directory itself
7=0-6 numbered directories
21=total number of directories within /slots/ according to linux

It appears . and .. are counted as directories, so that is why the total directories under /slots/ is 21 on my machine. Running several tasks of cep2 may add up with its subdirectories when counting the . and .. as well.
[Mar 27, 2014 6:43:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mgruben
Advanced Cruncher
Joined: May 26, 2013
Post Count: 94
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: "400 slots directories" work unit error

Bobcat,

The 400 slots error resumed this morning, so I followed your suggestion:
[root@system home]# cd /var/lib/boinc/slots
[root@system slots]# echo */ | wc
1 6 18
This command however does not appear to count beyond a depth of 1, unlike the following command:
[root@system boinc]# find slots -mindepth 1 -type d | wc -l
1025

[user@system ~]$ boinccmd --get_messages | tail
1001: 28-Mar-2014 04:46:32 (internal error) [World Community Grid] [error] Can't create task for MCM1_0003454_7553_3
1002: 28-Mar-2014 04:46:33 (internal error) [World Community Grid] [error] exceeded limit of 400 slot directories
1003: 28-Mar-2014 04:46:33 (internal error) [World Community Grid] [error] Can't create task for MCM1_0003454_7553_3
1004: 28-Mar-2014 04:46:39 (low) [rosetta@home] Finished download of rb_03_18_46666_91028_h003__antprot1_aah003_13_05.200_v1_3.gz
1005: 28-Mar-2014 04:46:39 (internal error) [World Community Grid] [error] exceeded limit of 400 slot directories
1006: 28-Mar-2014 04:46:39 (internal error) [World Community Grid] [error] Can't create task for MCM1_0003454_7553_3
1007: 28-Mar-2014 04:47:28 (internal error) [World Community Grid] [error] exceeded limit of 400 slot directories
1008: 28-Mar-2014 04:47:28 (internal error) [World Community Grid] [error] Can't create task for MCM1_0003454_7553_3
1009: 28-Mar-2014 04:47:29 (internal error) [World Community Grid] [error] exceeded limit of 400 slot directories
1010: 28-Mar-2014 04:47:29 (internal error) [World Community Grid] [error] Can't create task for MCM1_0003454_7553_3

----------------------------------------

----------------------------------------
[Edit 2 times, last edit by Mgruben at Mar 28, 2014 10:00:38 AM]
[Mar 28, 2014 9:48:38 AM]   Link   Report threatening or abusive post: please login first  Go to top 
BobCat13
Senior Cruncher
Joined: Oct 29, 2005
Post Count: 295
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: "400 slots directories" work unit error

You are correct as echo does not go deep enough. Sorry about that.

I would be curious to see how many directories are in one of those slots that has a cep2 tasks in it. Could you locate which slot has a cep2 and cd into it and then run the find command again on that slot only?
[Mar 28, 2014 2:52:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mgruben
Advanced Cruncher
Joined: May 26, 2013
Post Count: 94
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: "400 slots directories" work unit error

I would be curious to see how many directories are in one of those slots that has a cep2 tasks in it. Could you locate which slot has a cep2 and cd into it and then run the find command again on that slot only?
Folder 1 contains a CEP2 WU which is suspended while BOINC lets rosetta catch up; it's directory count alone was 219.

I note as an aside however that:
(1) the 1000+ directory count noted above was when the rig was working on four Rosetta@Home work units. If this is a boinc-level problem, then the error's presence even outside of WCG-context would make sense,
(2) even though the rig has been quiet (has not been giving 400 slot directory errors) for the past 6 hours, the current output of "find slots -mindepth 1 -type d | wc -l" is 1026. One would think that such a high slot directory count should cause errors to be thrown when boinc attempts to start new tasks, but apparently not.

Log since this morning (exited boinc client to disable my slots RAMdisk then restarted after umounting)
1: 28-Mar-2014 04:52:32 (low) [] cc_config.xml not found - using defaults
2: 28-Mar-2014 04:52:32 (low) [] Starting BOINC client version 7.2.42 for x86_64-pc-linux-gnu
3: 28-Mar-2014 04:52:32 (low) [] log flags: file_xfer, sched_ops, task
4: 28-Mar-2014 04:52:32 (low) [] Libraries: libcurl/7.35.0 OpenSSL/1.0.1f zlib/1.2.8 libssh2/1.4.3
5: 28-Mar-2014 04:52:32 (low) [] Data directory: /var/lib/boinc
6: 28-Mar-2014 04:52:32 (low) [] No usable GPUs found
7: 28-Mar-2014 04:52:32 (low) [] Host name: Archer
8: 28-Mar-2014 04:52:32 (low) [] Processor: 4 GenuineIntel Intel(R) Core(TM) i5-3470T CPU @ 2.90GHz [Family 6 Model 58 Stepping 9]
9: 28-Mar-2014 04:52:32 (low) [] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
10: 28-Mar-2014 04:52:32 (low) [] OS: Linux: 3.13.6-1-ARCH
11: 28-Mar-2014 04:52:32 (low) [] Memory: 11.64 GB physical, 1024.00 MB virtual
12: 28-Mar-2014 04:52:32 (low) [] Disk: 54.90 GB total, 8.98 GB free
13: 28-Mar-2014 04:52:32 (low) [] Local time is UTC -5 hours
14: 28-Mar-2014 04:52:32 (low) [rosetta@home] URL http://boinc.bakerlab.org/rosetta/; Computer ID 1751517; resource share 100
15: 28-Mar-2014 04:52:32 (low) [World Community Grid] URL http://www.worldcommunitygrid.org/; Computer ID 2757007; resource share 100
16: 28-Mar-2014 04:52:32 (low) [World Community Grid] General prefs: from World Community Grid (last modified 23-Feb-2014 17:04:20)
17: 28-Mar-2014 04:52:32 (low) [World Community Grid] Computer location: home
18: 28-Mar-2014 04:52:32 (low) [] General prefs: using separate prefs for home
19: 28-Mar-2014 04:52:32 (low) [] Preferences:
20: 28-Mar-2014 04:52:32 (low) [] max memory usage when active: 10727.47MB
21: 28-Mar-2014 04:52:32 (low) [] max memory usage when idle: 11919.41MB
22: 28-Mar-2014 04:52:32 (low) [] max disk usage: 10.21GB
23: 28-Mar-2014 04:52:32 (low) [] don't use GPU while active
24: 28-Mar-2014 04:52:32 (low) [] (to change preferences, visit a project web site or select Preferences in the Manager)
25: 28-Mar-2014 04:52:32 (low) [] Not using a proxy
26: 28-Mar-2014 04:52:36 (internal error) [World Community Grid] [error] exceeded limit of 400 slot directories
27: 28-Mar-2014 04:52:36 (internal error) [World Community Grid] [error] Can't create task for MCM1_0003454_7553_3
28: 28-Mar-2014 04:52:37 (internal error) [World Community Grid] [error] exceeded limit of 400 slot directories
29: 28-Mar-2014 04:52:37 (internal error) [World Community Grid] [error] Can't create task for MCM1_0003454_7553_3
30: 28-Mar-2014 04:52:56 (internal error) [World Community Grid] [error] exceeded limit of 400 slot directories
31: 28-Mar-2014 04:52:56 (internal error) [World Community Grid] [error] Can't create task for MCM1_0003454_7553_3
32: 28-Mar-2014 04:52:57 (internal error) [World Community Grid] [error] exceeded limit of 400 slot directories
33: 28-Mar-2014 04:52:57 (internal error) [World Community Grid] [error] Can't create task for MCM1_0003454_7553_3
34: 28-Mar-2014 04:53:58 (low) [World Community Grid] Starting task MCM1_0003454_7553_3
35: 28-Mar-2014 05:00:01 (low) [] Suspending network activity - time of day
36: 28-Mar-2014 06:03:56 (low) [rosetta@home] Computation for task tj_3_25_refine_X17_BBGB_17_GB_o_1_s_.5__wb_fragments_relax_SAVE_ALL_OUT_155675_22_0 finished
37: 28-Mar-2014 07:29:07 (low) [rosetta@home] Computation for task foldit_997258_1003_fold_SAVE_ALL_OUT_155430_1155_0 finished
38: 28-Mar-2014 07:29:07 (low) [rosetta@home] Starting task gr032714_ama1_longee_try10_relax_SAVE_ALL_OUT_156507_10_0
39: 28-Mar-2014 08:42:55 (low) [rosetta@home] Computation for task 2k820001__fold_SAVE_ALL_OUT_156520_18_0 finished
40: 28-Mar-2014 08:42:55 (low) [rosetta@home] Starting task rb_03_27_46817_91383__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_156483_529_0
41: 28-Mar-2014 09:08:45 (low) [World Community Grid] Computation for task MCM1_0003454_7553_3 finished
42: 28-Mar-2014 09:08:45 (low) [rosetta@home] Starting task 4_2_revert_W130_fold_SAVE_ALL_OUT_156418_371_0
43: 28-Mar-2014 09:43:03 (low) [rosetta@home] Computation for task yrssfrv2d3_4_fold_SAVE_ALL_OUT_155177_2795_0 finished
44: 28-Mar-2014 09:43:03 (low) [rosetta@home] Starting task gr032614_ama1_longee_try177_fold_SAVE_ALL_OUT_156383_251_0

----------------------------------------

[Mar 28, 2014 4:18:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
PMH_UK
Veteran Cruncher
UK
Joined: Apr 26, 2007
Post Count: 771
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: "400 slots directories" work unit error

Could it be a misleading message due to an issue creating a new slot directory ?
If something like permissions were such that existing slot directories could be used but new ones not created you may get this message as code does not expect other failures creating a slot directory.

Paul.
----------------------------------------
Paul.
[Mar 28, 2014 6:15:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 38   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread