Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 81
|
![]() |
Author |
|
goben_2003
Advanced Cruncher Joined: Jun 16, 2006 Post Count: 146 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thank you for all the help. I checked and trim is enabled. I reinstalled lmdisk with same settings as last time except i turned off dynamic disk and now it seems to be working! Mayby that was it, my system did not like the dynamic disk setting. After that i also changed so it will only contain the slots folder. I restarted a couple of times without starting up WCG. Final test is to see if it holds up after running the WCG for a day and then restart. Should i alwas do a manual sync to be safe? Mayby even do a copy first time so i dont loose anything? You're welcome! Yes, it seems the dynamic disk setting is causing issues for people. Feel free to do a manual sync to be safe. I had it save the ramdisk to an image file just to be safe when I was testing it. I never actually had to use the image file though. To save the image file: Open up "ImDisk Virtual Disk Driver". It is in the start menu and at Control Panel->ImDisk Virtual Disk Driver. Select the drive. Click save image. I used the option with 0 offset(no MBR) since it is mounted to a folder instead of as a drive. Click OK. This will pop up a warning if it is in use. Make sure BOINC is completely shutdown and you do not have any of the files open. I did not save it if it popped up a warning. Select where you want to save it and what to call it. Click Save. ![]() |
||
|
Andyman
Cruncher Joined: Apr 9, 2021 Post Count: 17 Status: Offline Project Badges: ![]() |
Great i will try that!
|
||
|
Dayle Diamond
Senior Cruncher Joined: Jan 31, 2013 Post Count: 452 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I come back after two days and this thread blew up!
----------------------------------------Still no ETA from the WCG. Many thanks to those posters helping each other make RAM disks, although personally I don't have a lot of RAM headroom in my system (just didn't need more before this bug), and won't buy more RAM if the fix is coming anytime soon. Keep crunching & stay safe out there! For those who are counting, I'm at 8.4 TB written. [Edit 1 times, last edit by Dayle Diamond at May 2, 2021 6:33:26 AM] |
||
|
bozz4science
Advanced Cruncher Germany Joined: May 3, 2020 Post Count: 104 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks for providing this very helpful step by step guide on page 2!!
----------------------------------------Once you know how to implement this RAMdisk solution, it is very straight forward. Initially had some trouble rebooting after setting it up, causing some weird BIOS issues at first. Finally, it seems to work perfectly. Reason for shifting to the RAMdisk approach was my Evo970 Plus being completely trashed with writes accumulating about 25TB since the start of the stress test. Now sitting at <1MB/s most of the time except for the occasional new work download. Before, my NVME SSD registered a write activity somewhere between 70-90 MB/s with 13 concurrent GPU WUs. I opted to mount the slot directory only and sized it accordingly to 4 GB (directory is 3.8GB). So far everthing works smoothly. Tasks finish an validate as before. Kudos to you for this elegant approach! ![]() AMD Ryzen 3700X @ 4.0 GHz / GTX1660S Intel i5-4278U CPU @ 2.60GHz |
||
|
sam6861
Advanced Cruncher Joined: Mar 31, 2020 Post Count: 107 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Slow computer, looks ok.
----------------------------------------Intel Atom N270 2GB RAM, no OpenCL, Linux, EXT4 single SATA SSD storage. 143860298 sectors (0.06 TiB) read, 136 days, /proc/diskstats column 3. 314117464 sectors (0.14 TiB) written, 136 days, , /proc/diskstats column 7. 643195368 sectors (0.29 TiB) written, lifetime from smartctl. Fast computer, too much writes. Intel i7-2600 16GB RAM, AMD RX 580, Linux Debian, BTRFS RAID1 53341744 sectors (0.02 TiB) read, 5 days, SATA SSD 54264880 sectors (0.02 TiB) read, 5 days, USB flash drive 3917875560 sectors (1.8 TiB) written. 5 days, SATA SSD 3917875560 sectors (1.8 TiB) written. 5 days, USB flash drive 13279749026 sectors (6.1 TiB) written. SATA SSD Lifetime from gsmartcontrol. Also the fast computer is sort of going unresponsive at times due to a slow 10 MB/s USB flash drive and constant writes, and it sometimes drops CPU usage down from slow storage. Oh and I wonder which drive will fail first in a RAID1, USB flash drive or SATA SSD. I am sure my Windows 10 with fast AMD 5500 XT have similar problems as well. OPNG is probably writing way too much checkpoints files just to complete in 20 minutes. Can make Linux tmpfs to make it work faster and reduce amount of writes. service boinc-client stop cd /var/lib/boinc-client mv slots slots_old mkdir slots mount -t tmpfs -o size=8G tmpfs slots cp -rp slots_old/* slots/ service boinc-client start Edit: Also do: chown boinc:boinc /var/lib/boinc-client/slots so Boinc can make more slot folders when needed. The problem with TmpFS or RAM drive is, I somewhat don't want ARP1 tasks to lose all checkpoints on super long runtime of 30 hours when the computer crashes, freezes, or lost power. [Edit 1 times, last edit by sam6861 at May 2, 2021 9:29:00 AM] |
||
|
maeax
Advanced Cruncher Joined: May 2, 2007 Post Count: 142 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Have set it to 1200 sec. (20 Min.)
----------------------------------------Boincmanager - Preferences: Request tasks to checkpoint at most every N seconds: This controls how often tasks save their state to disk, so they can be restarted later.
AMD Ryzen Threadripper PRO 3995WX 64-Cores/ AMD Radeon (TM) Pro W6600. OS Win11pro
|
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1323 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The problem is that these OPNG GPU-tasks don't obey BOINC preference 'write to disk' every ... seconds.
I've a slow GPU-card and 1 GPU-task with ~70 jobs need about 43 minutes elapsed and will write at least 70 times a checkpoint to disk. You can imagine what's happening with the normal standard GPU's nowadays. Those tasks will write every few seconds to disk. |
||
|
goben_2003
Advanced Cruncher Joined: Jun 16, 2006 Post Count: 146 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The problem is that these OPNG GPU-tasks don't obey BOINC preference 'write to disk' every ... seconds. I've a slow GPU-card and 1 GPU-task with ~70 jobs need about 43 minutes elapsed and will write at least 70 times a checkpoint to disk. You can imagine what's happening with the normal standard GPU's nowadays. Those tasks will write every few seconds to disk. I think the key thing here is that OPN* does respect the write to disk request within jobs. However, it writes the result of each job when it completes. For OPNG, each job normally takes less(intel gpu) or way less(more powerful discrete gpu) time to complete than the minimum write to disk request time. ![]() |
||
|
maeax
Advanced Cruncher Joined: May 2, 2007 Post Count: 142 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
init_data.xml shows <disk_interval>1200.000000</disk_interval>
----------------------------------------B U T a lot of writes in resourcenmonitor for the SSD, OMG
AMD Ryzen Threadripper PRO 3995WX 64-Cores/ AMD Radeon (TM) Pro W6600. OS Win11pro
|
||
|
nyanthiss
Cruncher Joined: Nov 23, 2012 Post Count: 15 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I don't think there is a need to write out intermediate results within a single job of a WU. (which is the case currently, AFAICT it only writes out the final result of each job).
----------------------------------------OTOH, it's not enough to just write once at the end of WU. While there are GPUs which run the entire WU in maybe 2 minutes, there are GPUs which can take 1.5 hours per WU (and they still do useful work).
Intel Xeon E3-1231 v3
AMD A10 7800 AMD Ryzen 5 3500U AMD Ryzen 1700X AMD Ryzen 5900X 2x RaspberryPi, 1x Odroid |
||
|
|
![]() |