Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 63
|
![]() |
Author |
|
PMH_UK
Veteran Cruncher UK Joined: Apr 26, 2007 Post Count: 772 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I am also seeing high memory usage, real and swap, on multiple PCs.
----------------------------------------DSFL_00000050_0000028_0236 shows 720 MB real plus 720 MB swap. Wingman had: <core_client_version>6.10.17</core_client_version> <![CDATA[ <message> too many exit(0)s </message> ]]> Other on that 2GB dual core, DSFL_00000049_0000024_0657, is about 200 + 200. Just had to abort DSFL_00000050_0000008_0326 on 1Ghz 512MB laptop as nothing running. One wingman in progress, another had error: <core_client_version>6.13.6</core_client_version> <![CDATA[ <message> - exit code 195 (0xc3) </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [02:52:48] Number of tasks = 72 [02:52:48] Starting job 0,CPU time is 0.000000. [02:52:49] ./ZINC06746586.pdbqt size = 21 3 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000050.pdbqt size = 10543 0 Application exited with RC = 0x1 02:53:23 (772): called boinc_finish </stderr_txt> ]]> That laptop is now running DSFL_00000049_0000027_0430 at about 220 + 220. Looking back some recent workunits appear to have taken much longer than CPU reported. I have changed selection on that laptop to FAAH for now. Paul.
Paul.
|
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
We have stopped sending new work for DSFL. We are in the process of cancelling out the workunits that have the large search box. This is what is causing the workunits memory usage to be so large.
Thank you for your patience, -Uplinger |
||
|
Jim1348
Veteran Cruncher USA Joined: Jul 13, 2009 Post Count: 1066 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I would be happy to have the option to get the large work units if you want to offer that sometime. I have had two batch 49/50 work units validated, and the others are still in progress, with no invalid. I have a dedicated machine with a lot of memory that needs to be used.
----------------------------------------[Edit 1 times, last edit by Jim1348 at Oct 31, 2011 8:40:25 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Well, after reading your comments here I decided to give a waiting WU from series 50 a run on my I5 Core vPro (4 Proc, XP, 4 GB RAM). Have now running 3 49er's and one from series 50. MemUsage is about 140-230MB for the series 49 und 900MB for the one from batch 50. So far everything is running fine, no paging, no unusual effects. I decided to manage the outstanding batch 50 tasks manually, it's in this case a kind of balancing the systems ressources manually. Would be a sad thing if I have to stop the calculating on my batch-50 WU in the queue.
If some of you still have some 49ers in your queue, then you are lucky, then is a chance for handling this issue. Mix it with batch 50 carefully and there is a chance to handle this. Other solving thing could be reducing the amount of available cores, if your RAM is to small. Hope this gets solved quickly by the techs. Have a good crunch |
||
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1673 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi Uplinger,
----------------------------------------I've just noticed the general cancellation of the batch 50. Do you know maybe when new WUs will be available? Should we switch to other projects in the mean time? Cheers, Yves |
||
|
PMH_UK
Veteran Cruncher UK Joined: Apr 26, 2007 Post Count: 772 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Note that batch 50 work units are being server aborted.
----------------------------------------If your wingman has already reported it may be worth running, unlikely if not. Paul.
Paul.
|
||
|
sk..
Master Cruncher http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif Joined: Mar 22, 2007 Post Count: 2324 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
As luck would have it, during the weekend I set a Ubuntu 11.10 system to 'Run Always' (i7-2600, but only 4GB), to finish off something else. Checked the system today and it was frozen, no response to user. Had to force a shutdown, boot and exit Boinc (repo inst). Won't be using 'Run Always' any time soon.
The legacy, 31 failed DSFL tasks and 26 failed GPUGrid tasks! Result Name: DSFL_ 00000050_ 0000033_ 0584_ 1-- <core_client_version>6.12.33</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61) </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [17:41:02] Number of tasks = 40 [17:41:02] Starting job 0,CPU time is 0.000000. [17:41:02] ./ZINC15674635.pdbqt size = 30 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000050.pdbqt size = 10543 0 17:52:41 (7869): No heartbeat from core client for 30 sec - exiting 17:52:53 (7869): No heartbeat from core client for 30 sec - exiting 17:52:54 (7869): No heartbeat from core client for 30 sec - exiting 17:52:55 (7869): No heartbeat from core client for 30 sec - exiting 17:53:09 (7869): No heartbeat from core client for 30 sec - exiting 17:53:10 (7869): No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting INFO: No state to restore. Start from the beginning. [17:58:01] Number of tasks = 40 [17:58:01] Starting job 0,CPU time is 0.000000. [17:58:01] ./ZINC15674635.pdbqt size = 30 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000050.pdbqt size = 10543 0 Quit requested: Exiting INFO: No state to restore. Start from the beginning. [18:15:18] Number of tasks = 40 [18:15:18] Starting job 0,CPU time is 0.000000. [18:15:18] ./ZINC15674635.pdbqt size = 30 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000050.pdbqt size = 10543 0 18:18:09 (7999): No heartbeat from core client for 30 sec - exiting 18:18:10 (7999): No heartbeat from core client for 30 sec - exiting 18:18:11 (7999): No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting 18:18:17 (7999): No heartbeat from core client for 30 sec - exiting INFO: No state to restore. Start from the beginning. [18:18:52] Number of tasks = 40 [18:18:52] Starting job 0,CPU time is 0.000000. [18:18:52] ./ZINC15674635.pdbqt size = 30 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000050.pdbqt size = 10543 0 18:23:20 (8030): No heartbeat from core client for 30 sec - exiting No heartbeat: Exiting 18:23:27 (8030): No heartbeat from core client for 30 sec - exiting INFO: No state to restore. Start from the beginning. [18:24:37] Number of tasks = 40 [18:24:37] Starting job 0,CPU time is 0.000000. [18:24:37] ./ZINC15674635.pdbqt size = 30 6 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000050.pdbqt size = 10543 0 Application exited with RC = 0xb 18:33:02 (8054): called boinc_finish </stderr_txt> ]]> close I was down to HFCC and DSFL, so I had better add a project to avoid task outages. |
||
|
deltavee
Ace Cruncher Texas Hill Country Joined: Nov 17, 2004 Post Count: 4890 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Several of my batch 50 WUs just went from went from "Pending Validation" to "Too Late".
|
||
|
boulmontjj
Senior Cruncher France Joined: Nov 17, 2004 Post Count: 317 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I don't know if it is specifiquely DSFL or Boinc or Windows.
----------------------------------------I have different versions of Boinc on 3 of my PC's. On the first one, i crunch WCG DSFL, on the second one, i crunch for Rosetta, on the third one, i crunch for Malaria and the 3 pc's does the same thing, the WU use more and more memory. It looks like if the memory is never given back. One the 2 Quad Q6600, four tasks running using a large amount of memory and the PC's are very slow when we want to do something else. On the third one, a core 2 Duo, just one task (Malaria) is running and the other one is waiting for memory. And impossible to do something else. Never seen that. If this can help for investigations. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello Apis Tintinnambulator,
The FAQ referred to is for the project program. This helps the project scientists. We just ignore it. Any way, this is just another trick that the universe is playing on us. ![]() Lawrence |
||
|
|
![]() |