Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 38
|
![]() |
Author |
|
Mumak
Senior Cruncher Joined: Dec 7, 2012 Post Count: 477 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hmm, and what if somebody has a machine with more that 400 threads? ;)
----------------------------------------![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
As was posted before in this thread, the multiplier is ncpus * 100 slots meaning if a device has 8 cores it can have 800 slots without problem. If a device has 400 threads, slot count possible would be, well use the calculator.
![]() |
||
|
Mgruben
Advanced Cruncher Joined: May 26, 2013 Post Count: 94 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Slots subdirectories are numbered 0 through 5, but contain many, many subdirectories (as shown in
----------------------------------------![]() ![]() |
||
|
Mgruben
Advanced Cruncher Joined: May 26, 2013 Post Count: 94 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
So yesterday I also attached rosetta@home due to these CEP2 shenanigans in the hopes of lessening the occurrence of this 400 slots error, but it has occurred again.
----------------------------------------This time, it's telling me that the rosetta work units can't be started, which means that I've been posting information about potentially-innocent work units all along (my other rigs have no problem with Rosetta units). Here are the CEP2 work units which were active at the time of the error message: ======== Tasks ======== ![]() |
||
|
Mgruben
Advanced Cruncher Joined: May 26, 2013 Post Count: 94 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Also, oddly enough this problem appears to repeat every day shortly after my network availability window opens (open from 3a to 5a)
----------------------------------------![]() ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
To me it's something that makes what boinc thinks based on what is in memory and not the true state on the disc, or vice versa. Directory structure information not updated fast enough. Networking takes a bunch of cpu use, although one would think a subsystem would deal with that. Set the cc_config to only allow 1 thread uploading or downloading at any time. Your 2 hour window would be enough, also concurrent uploading of cep2 to harvard does not result in any time gain at the end, the overhead could be making it even slower. For instance try the config options:
<max_file_xfers>2</max_file_xfers> <max_file_xfers_per_project>1</max_file_xfers_per_project> Think you resolved the issue of not uploading/fetching/reporting until 30 minutes before net close. That's a boinc feature btw to then do it immediately until T minus zero. Reporting is in latest test agents hardcoded to happen at least every 60 minutes, no more up to 24 hour waiting. This applies to both netowrk and cpu scheduling. |
||
|
Mgruben
Advanced Cruncher Joined: May 26, 2013 Post Count: 94 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
lava,
----------------------------------------Does your suggestion remain the same even though the project updates (both uploads and downloads) complete successfully every morning? ![]() |
||
|
Mgruben
Advanced Cruncher Joined: May 26, 2013 Post Count: 94 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Congratulations for being the first with that error message. I suggest that you reboot and see if you can get it again. If so, please post the first 50 or so lines in your event log so that everybody can see what sort of system you have. Lawrence,Since this thread began, I have rebooted my system, and the error has recurred (my more recent posts) after this reboot. After running (on Arch Linux) sudo systemctl restart boinc.serviceThe following is displayed: [user@system ~]$ boinccmd --get_messages Also, immediately after restarting the boinc client, the 400 slots directories errors resume: 33: 27-Mar-2014 09:33:57 (internal error) [rosetta@home] [error] exceeded limit of 400 slot directories ![]() [Edit 2 times, last edit by Mgruben at Mar 27, 2014 2:39:12 PM] |
||
|
Mgruben
Advanced Cruncher Joined: May 26, 2013 Post Count: 94 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
When you look in the /boinc/slots place now, does it show that many i.e. slots/399 as the highest? If not do slots plus sub-directories there off add up to this number? As lawrenceharding commented, not seen here before your report, there's something special about your system. Is it caching the disc structures and not writing the updates to disc? Look at write to disc delays. If there's a cache-flush command in linux, run that. The "something special" may be that my /var/lib/boinc/slots directory resides on a 7.7GB RAMdisk allocation, though I'm personally not seeing how that would be relevant[user@system ~]$ df -h ![]() [Edit 2 times, last edit by Mgruben at Mar 27, 2014 2:45:58 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
gruby, yes as it's to me evident your subject linux system is not able to keep up in some way. Setting the traps can eliminate possibilities, also because you now added for it to happen during the networking window specifically.
In a previous post also mentioned write delays, cache flushing. Maybe your disk subsystem needs investigating. Is the controller doing fine for instance? Otherwise, suggest you carry this riddle to the developers. There's a lot of very knowing people on the alpha mail list. |
||
|
|
![]() |