Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Thread Type: Sticky Thread
Total posts in this thread: 27
Posts: 27   Pages: 3   [ 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 7538 times and has 26 replies Next Thread
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
CPU work for OpenPandemics stopped

There appears to be an issue with recent jobs causing excessive memory consumption on end-user machines. We have stopped sending work for these tasks while we investigate the issue.
----------------------------------------
[Edit 1 times, last edit by knreed at Oct 7, 2021 4:17:47 PM]
[Oct 5, 2021 4:22:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
rcthardcore
Cruncher
United States
Joined: Jan 29, 2009
Post Count: 13
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CPU work for OpenPandemics stopped

The same problem is present on the Mapping Cancer Markers project too. You might want to check into this.
----------------------------------------
AMD Ryzen 9 5950x
NVIDIA RTX 3090 FE
128 GB DDR4-3200
Windows 10 64-bit 21H1
[Oct 5, 2021 8:34:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CPU work for OpenPandemics stopped

The direction that we are going to take with this is as follows:

- For workunits that have copies already sent out, we are changing the resource configuration for memory to reflect the much larger RAM used (either 4GB or 1.5GB depending on the specific job). We are also boosting the disk size to accommodate extra size. This will allow BOINC to manage sending resends to only computers that can handle them (it will also keep multiple copies from running at once)
- For the other jobs we are going to rebuild them but limit them to 20 per workunit so that this issue doesn't consume massive memory. This will result in a period of quick jobs with a little bit larger than normal memory footprint, but it cause the issues like these have.
- Longer term the memory leak will need to be fixed (it is 14 MB per job within the workunit)

Testing the changes and getting things in place will take a bit time so CPU work will be stopped for the next 24-48 hours while we prepare and test.
[Oct 6, 2021 1:58:52 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 1881
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CPU work for OpenPandemics stopped

Thanks knreed for the update.
Release the Kraken biggrin
----------------------------------------

[Oct 6, 2021 2:16:22 AM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CPU work for OpenPandemics stopped

Release the Kraken biggrin


The kraken's have been released. wink


We have re-enabled sending worker for OpenPandemics CPU. However, at this time the only jobs available are the repair jobs for the workunits that require large memory. The workunits have been modified to state that they need either 4 GB or 1.5 GB depending on the job. This will ensure that they run successfully.

There are only 48,000 of these at the moment, but given the high requirements I don't know how long it will take to distribute them.

We continue to work on recreating the other batches with limits to prevent the high memory use.
[Oct 6, 2021 9:31:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 1881
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CPU work for OpenPandemics stopped

Thanks for the Kraken, knreed. smile
So far, only resends for normal "No Reply", or "Error" tasks, from earlier batches than the problem batches.
----------------------------------------

[Oct 6, 2021 10:07:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
BladeD
Ace Cruncher
USA
Joined: Nov 17, 2004
Post Count: 28976
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CPU work for OpenPandemics stopped

Release the Kraken biggrin


The kraken's have been released. wink


We have re-enabled sending worker for OpenPandemics CPU. However, at this time the only jobs available are the repair jobs for the workunits that require large memory. The workunits have been modified to state that they need either 4 GB or 1.5 GB depending on the job. This will ensure that they run successfully.

There are only 48,000 of these at the moment, but given the high requirements I don't know how long it will take to distribute them.

We continue to work on recreating the other batches with limits to prevent the high memory use.

I thought we were talking about more GPU WUs. sad
----------------------------------------
[Oct 6, 2021 11:09:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
mwroggenbuck
Advanced Cruncher
USA
Joined: Nov 1, 2006
Post Count: 77
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CPU work for OpenPandemics stopped

The workset is apparently still to large for the Raspberry Pi 400. The server sends a message saying that there is not enough memory (3814.70 MB RAM needed but only 3455.31 available). Note this is an ARM 64 machine, but running a 32-bit OS.

Something still needs to be fixed. I have been running this project for several months on my Raspberry Pi. It stopped when this forum entry was created.
----------------------------------------
[Edit 1 times, last edit by mwroggenbuck at Oct 6, 2021 11:56:51 PM]
[Oct 6, 2021 11:52:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: CPU work for OpenPandemics stopped

The workset is apparently still to large for the Raspberry Pi 400. The server sends a message saying that there is not enough memory (3814.70 MB RAM needed but only 3455.31 available). Note this is an ARM 64 machine, but running a 32-bit OS.

Something still needs to be fixed. I have been running this project for several months on my Raspberry Pi. It stopped when this forum entry was created.


Actually - that is exactly what we want to have happen. When I wrote

We have re-enabled sending work for OpenPandemics CPU. However, at this time the only jobs available are the repair jobs for the workunits that require large memory. The workunits have been modified to state that they need either 4 GB or 1.5 GB depending on the job. This will ensure that they run successfully.


What this means is that the memory limit for these jobs has been reset so that those jobs will not be sent to computers with insufficient memory to handle them. Your raspberry pi is protected from these jobs because of this limit.

Most of the 4GB resends have now been sent out and we are about to start going through the jobs that require 1.5 GB. There are currently 21,888 of them to send out. Once we get through those we will then be sending out much more normal sized jobs and your computer should start to receive them at this time. That should be in another hour or two.
[Oct 7, 2021 12:03:04 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 27   Pages: 3   [ 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread