Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Thread Type: Sticky Thread Total posts in this thread: 27
|
![]() |
Author |
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
There appears to be an issue with recent jobs causing excessive memory consumption on end-user machines. We have stopped sending work for these tasks while we investigate the issue.
----------------------------------------[Edit 1 times, last edit by knreed at Oct 7, 2021 4:17:47 PM] |
||
|
rcthardcore
Cruncher United States Joined: Jan 29, 2009 Post Count: 13 Status: Offline Project Badges: ![]() ![]() ![]() |
The same problem is present on the Mapping Cancer Markers project too. You might want to check into this.
----------------------------------------
AMD Ryzen 9 5950x
NVIDIA RTX 3090 FE 128 GB DDR4-3200 Windows 10 64-bit 21H1 |
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The direction that we are going to take with this is as follows:
- For workunits that have copies already sent out, we are changing the resource configuration for memory to reflect the much larger RAM used (either 4GB or 1.5GB depending on the specific job). We are also boosting the disk size to accommodate extra size. This will allow BOINC to manage sending resends to only computers that can handle them (it will also keep multiple copies from running at once) - For the other jobs we are going to rebuild them but limit them to 20 per workunit so that this issue doesn't consume massive memory. This will result in a period of quick jobs with a little bit larger than normal memory footprint, but it cause the issues like these have. - Longer term the memory leak will need to be fixed (it is 14 MB per job within the workunit) Testing the changes and getting things in place will take a bit time so CPU work will be stopped for the next 24-48 hours while we prepare and test. |
||
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2195 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks knreed for the update.
Release the Kraken ![]() |
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Release the Kraken ![]() The kraken's have been released. ![]() We have re-enabled sending worker for OpenPandemics CPU. However, at this time the only jobs available are the repair jobs for the workunits that require large memory. The workunits have been modified to state that they need either 4 GB or 1.5 GB depending on the job. This will ensure that they run successfully. There are only 48,000 of these at the moment, but given the high requirements I don't know how long it will take to distribute them. We continue to work on recreating the other batches with limits to prevent the high memory use. |
||
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2195 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks for the Kraken, knreed.
![]() So far, only resends for normal "No Reply", or "Error" tasks, from earlier batches than the problem batches. |
||
|
BladeD
Ace Cruncher USA Joined: Nov 17, 2004 Post Count: 28976 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Release the Kraken ![]() The kraken's have been released. ![]() We have re-enabled sending worker for OpenPandemics CPU. However, at this time the only jobs available are the repair jobs for the workunits that require large memory. The workunits have been modified to state that they need either 4 GB or 1.5 GB depending on the job. This will ensure that they run successfully. There are only 48,000 of these at the moment, but given the high requirements I don't know how long it will take to distribute them. We continue to work on recreating the other batches with limits to prevent the high memory use. I thought we were talking about more GPU WUs. ![]() |
||
|
mwroggenbuck
Advanced Cruncher USA Joined: Nov 1, 2006 Post Count: 77 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The workset is apparently still to large for the Raspberry Pi 400. The server sends a message saying that there is not enough memory (3814.70 MB RAM needed but only 3455.31 available). Note this is an ARM 64 machine, but running a 32-bit OS.
----------------------------------------Something still needs to be fixed. I have been running this project for several months on my Raspberry Pi. It stopped when this forum entry was created. [Edit 1 times, last edit by mwroggenbuck at Oct 6, 2021 11:56:51 PM] |
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The workset is apparently still to large for the Raspberry Pi 400. The server sends a message saying that there is not enough memory (3814.70 MB RAM needed but only 3455.31 available). Note this is an ARM 64 machine, but running a 32-bit OS. Something still needs to be fixed. I have been running this project for several months on my Raspberry Pi. It stopped when this forum entry was created. Actually - that is exactly what we want to have happen. When I wrote We have re-enabled sending work for OpenPandemics CPU. However, at this time the only jobs available are the repair jobs for the workunits that require large memory. The workunits have been modified to state that they need either 4 GB or 1.5 GB depending on the job. This will ensure that they run successfully. What this means is that the memory limit for these jobs has been reset so that those jobs will not be sent to computers with insufficient memory to handle them. Your raspberry pi is protected from these jobs because of this limit. Most of the 4GB resends have now been sent out and we are about to start going through the jobs that require 1.5 GB. There are currently 21,888 of them to send out. Once we get through those we will then be sending out much more normal sized jobs and your computer should start to receive them at this time. That should be in another hour or two. |
||
|
|
![]() |