Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 162
|
![]() |
Author |
|
bozz4science
Advanced Cruncher Germany Joined: May 3, 2020 Post Count: 104 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Got roughly 3 pages worth of beta tasks, all but one in PV. Only error was this WU: https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=550367778 (Win 10, 1660 Super/750Ti)
----------------------------------------Do we already have insights into how OC of a GPU might affect the stability of the results? Anyone tried so far running multiple GPU WUs concurenntly on the same GPU? Was wondering if you can increase WU output by forcing the GPU to hold the GPU load more constantly on a high level instead of these short bursts up to 100% and then back to 0%. Very impressive speedups, seeing runtimes between 2 an 6 min depening on the WU size on my GTX 1660 Super and 6-12 min on my 750 Ti. That's a huge efficiency gain vs. CPU-computed WUs. However, due to the inherent nature of these WUs, the GPUs' VRMs are getting kicked hard. They continiously have to adjust the voltage of the GPU chip up and down according to the short intensive bursts of the computations. For the 1660 Super voltage was all over the place. And I am defintely here for the science and to help to fight the pandemic from home, but is a base credit of 2.6 really adequate for computing >100 CPU jobs in one run - albeit in much shorter time? ![]() AMD Ryzen 3700X @ 4.0 GHz / GTX1660S Intel i5-4278U CPU @ 2.60GHz [Edit 1 times, last edit by bozz4science at Mar 2, 2021 9:12:25 AM] |
||
|
widdershins
Veteran Cruncher Scotland Joined: Apr 30, 2007 Post Count: 674 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
@uplinger a possible shortcut in your line of research might be to contact NVIDIA. Whilst some random member of the public may not get a reply I would expect an enquiry from a tech at IBM working on a gpu compute project would get better service.
![]() I'd imagine that if there is a problem with OpenCL on some of their older cards they'd already know about it, and even better, be able to point you to the possible cause with a lot less effort on your part. Or at least confirm that it will never work correctly on certain cards to save you any further work. |
||
|
Vester
Senior Cruncher USA Joined: Nov 18, 2004 Post Count: 325 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Bozz4science said, "Anyone tried so far running multiple GPU WUs concurenntly on the same GPU?"
----------------------------------------Yes. I am running three per GPU on AMD Radeon HD 7990 rig with 8 GPUs. No failures. ![]() |
||
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
@uplinger a possible shortcut in your line of research might be to contact NVIDIA. Whilst some random member of the public may not get a reply I would expect an enquiry from a tech at IBM working on a gpu compute project would get better service. ![]() I'd imagine that if there is a problem with OpenCL on some of their older cards they'd already know about it, and even better, be able to point you to the possible cause with a lot less effort on your part. Or at least confirm that it will never work correctly on certain cards to save you any further work. IIRC this was also a problem with the HCC GPU app. Certain older cards were not capable of running the app and therefore were put on an ignore list so to speak.
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
![]() ![]() |
||
|
goben_2003
Advanced Cruncher Joined: Jun 16, 2006 Post Count: 146 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Edit 2: Yup, with driver 425.31 the GTX660M show up as OpenCL 1.2. I doubt the card is really OpenCL 1.2 compliant though. The driver supports OpenCL 1.2, but the card may not, even though it is marketed as OpenCL 1.2 capable. Strange though that driver 306.14 which is from 2012, does not show the card as OpenCL 1.2, but then, the OpenCL 1.2 specification was actually announced on on November 15, 2011, so perhaps the 306.14 driver does not support OpenCL 1.2, and the card is actually only OpenCL 1.1 So, I'll forget about the GTX660M when it comes to GPU crunching here. @uplinger Hopefully this is helpful. From looking at the specs for the gtx 600-800 mobile series, they support opencl 1.1. The driver however covers all the way through the RTX 20 mobile series. So the driver reports supporting a higher opencl than the card supports. Notes: This is probably true for the non mobile gtx 600-800 series. The specs pages I looked at in the gtx 900 series - rtx 20 series do not list the opencl version sources: https://www.nvidia.com/en-us/geforce/gaming-l...-gtx-660m/specifications/ https://www.nvidia.com/en-us/geforce/gaming-l...-gtx-680m/specifications/ https://www.nvidia.com/en-us/geforce/gaming-l...-gtx-760m/specifications/ https://www.nvidia.com/en-us/geforce/gaming-l...-gtx-860m/specifications/ https://www.nvidia.com/en-us/drivers/results/145874/ 425.31 drivers https://www.nvidia.com/en-us/geforce/graphics...gtx-660ti/specifications/ https://www.nvidia.com/en-us/geforce/gaming-l...-gtx-960m/specifications/ https://www.nvidia.com/en-sg/geforce/products/10series/geforce-gtx-1060/ https://www.nvidia.com/en-us/geforce/graphics-cards/rtx-2060/ Edit 1: @Grumpy_Swede: I am curious what the output of Nvidia's OpenCL Device Query is for your GTX 660M. Specifically if it mentions openCL 1.1 anywhere. The Nvidia card I have right now supports opencl 1.2, so I cannot see if it is just saying opencl 1.2 because of the driver. Here is an example of some of the output from my card: OpenCL SW Info: The Windows / Linux / Mac Nvidia OpenCL Device Query can be found on this page. For the Windows 64bit version, it is in the zip at NVIDIA GPU Computing SDK\OpenCL\bin\win64\Release. If you unzip it and run the oclDeviceQuery.exe it will generate oclDeviceQuery.txt with the output in the same folder. ![]() [Edit 1 times, last edit by goben_2003 at Mar 2, 2021 1:52:00 PM] |
||
|
Jim1348
Veteran Cruncher USA Joined: Jul 13, 2009 Post Count: 1066 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Do we already have insights into how OC of a GPU might affect the stability of the results? Yes. It won't help it. I would never EVER overclock on a critical scientific project like this one. You just jeopardize the results for everyone. |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
@uplinger a possible shortcut in your line of research might be to contact NVIDIA. Whilst some random member of the public may not get a reply I would expect an enquiry from a tech at IBM working on a gpu compute project would get better service. ![]() I'd imagine that if there is a problem with OpenCL on some of their older cards they'd already know about it, and even better, be able to point you to the possible cause with a lot less effort on your part. Or at least confirm that it will never work correctly on certain cards to save you any further work. IIRC this was also a problem with the HCC GPU app. Certain older cards were not capable of running the app and therefore were put on an ignore list so to speak. Wow, your memory is better than mine. I went back into the history and found that we used to exclude these cards specifically. if ( !strcmp(c.prop.name, "ION") || !strcmp(c.prop.name, "GeForce 210") || !strcmp(c.prop.name, "GeForce 310") || !strcmp(c.prop.name, "GeForce 310M") || !strcmp(c.prop.name, "GeForce 315") || !strcmp(c.prop.name, "GeForce 315M") || !strcmp(c.prop.name, "GeForce 405") || !strcmp(c.prop.name, "GeForce 410M") || !strcmp(c.prop.name, "GeForce 610M") || !strcmp(c.prop.name, "GeForce 8200") || !strcmp(c.prop.name, "GeForce 8400") || !strcmp(c.prop.name, "GeForce 8400GS") || !strcmp(c.prop.name, "GeForce 8400 GS") || !strcmp(c.prop.name, "GeForce 8500 GT") || !strcmp(c.prop.name, "GeForce 8600 GS") || !strcmp(c.prop.name, "GeForce 8600 GT") || !strcmp(c.prop.name, "GeForce 8600M GS") || !strcmp(c.prop.name, "GeForce 8600M GT") || !strcmp(c.prop.name, "GeForce 8600 GTS") || !strcmp(c.prop.name, "GeForce 8700M GT") || !strcmp(c.prop.name, "GeForce 8800 GT") || !strcmp(c.prop.name, "GeForce 8800 GTS 512") || !strcmp(c.prop.name, "GeForce 8800M GTS") || !strcmp(c.prop.name, "GeForce 9200") || !strcmp(c.prop.name, "GeForce 9300 GE") || !strcmp(c.prop.name, "GeForce 9300M GS") || !strcmp(c.prop.name, "GeForce 9400 GT") || !strcmp(c.prop.name, "GeForce 9500 GS") || !strcmp(c.prop.name, "GeForce 9500 GT") || !strcmp(c.prop.name, "GeForce 9600 GS") || !strcmp(c.prop.name, "GeForce 9600 GSO") || !strcmp(c.prop.name, "GeForce 9600 GSO 512") || !strcmp(c.prop.name, "GeForce 9600 GT") || !strcmp(c.prop.name, "GeForce 9600M GT") || !strcmp(c.prop.name, "GeForce 9800 GT") || !strcmp(c.prop.name, "GeForce 9800 GTX+") || !strcmp(c.prop.name, "GeForce 9800 GTX/9800 GTX+") || !strcmp(c.prop.name, "GeForce 9800 S") || !strcmp(c.prop.name, "GeForce G102M") || !strcmp(c.prop.name, "GeForce G210") || !strcmp(c.prop.name, "GeForce GT 120") || !strcmp(c.prop.name, "GeForce GT 130") || !strcmp(c.prop.name, "GeForce GT 130M") || !strcmp(c.prop.name, "GeForce GT 220M") || !strcmp(c.prop.name, "GeForce GT 230") || !strcmp(c.prop.name, "GeForce GT 230M") || !strcmp(c.prop.name, "GeForce GT 325M") || !strcmp(c.prop.name, "GeForce GT 330") || !strcmp(c.prop.name, "GeForce GT 330M") || !strcmp(c.prop.name, "GeForce GT 420") || !strcmp(c.prop.name, "GeForce GT 510") || !strcmp(c.prop.name, "GeForce GT 520") || !strcmp(c.prop.name, "GeForce GT 520M") || !strcmp(c.prop.name, "GeForce GT 610") || !strcmp(c.prop.name, "GeForce GT 630") || !strcmp(c.prop.name, "GeForce GTS 240") || !strcmp(c.prop.name, "GeForce GTS 250") || !strcmp(c.prop.name, "GeForce GTX 260M") || !strcmp(c.prop.name, "GeForce GTX 280M") || !strcmp(c.prop.name, "GeForce GTX 660M") || !strcmp(c.prop.name, "NVS 300") || !strcmp(c.prop.name, "NVS 3100M") || !strcmp(c.prop.name, "NVS 4200M") || !strcmp(c.prop.name, "NVS 5100M") || !strcmp(c.prop.name, "Quadro 400") || !strcmp(c.prop.name, "Quadro FX 1600M") || !strcmp(c.prop.name, "Quadro FX 1700") || !strcmp(c.prop.name, "Quadro FX 1800") || !strcmp(c.prop.name, "Quadro FX 2700M") || !strcmp(c.prop.name, "Quadro FX 2800M") || !strcmp(c.prop.name, "Quadro FX 3700") || !strcmp(c.prop.name, "Quadro FX 380") || !strcmp(c.prop.name, "Quadro FX 570") || !strcmp(c.prop.name, "Quadro FX 570M") || !strcmp(c.prop.name, "Quadro FX 580") || !strcmp(c.prop.name, "Quadro FX 770M") || !strcmp(c.prop.name, "Quadro FX 880M") || !strcmp(c.prop.name, "Quadro NVS 160M") || !strcmp(c.prop.name, "Quadro NVS 290") Now back then we ran everything through code in the scheduler, but for this new beta we are using the more popular method by BOINC for plan classes and that is to define it in an xml file that is configurable. https://boinc.berkeley.edu/trac/wiki/AppPlanSpec I'm hoping to find a trend on the cards that have errors that would allow me to exclude them easily with this xml file. But my fall back plan is to let those devices fail out and get limited to 1 task per day as that was not a feature back when HCC was running. I do have a minimum of 1.2 opencl set, but as some have noted, their cards are reporting 1.2 but fail still. This leads me to believe that they weren't 100% compatible with version 1.2... Also, thank you all for the information, I will be turning on the validator in a bit to see what kind of errors we catch. As for the points, I have not paid much attention to them at the moment as getting the science done is a higher priority. With the question of checkpointing and disk usage. The application writes the results to disk every time they finish a ligand. This is where a checkpoint takes place and is the easiest way to restore from. The researchers are planning to send out more difficult ligands for GPU which from my recommendation should sit around 5 minutes per ligand on average. This means that if you have an awesome card (you are very lucky and i'm jealous) that check pointing may still happen every 30 seconds or so. What is being written is very small amount of data, this should not wear out an SSD. Thanks, -Uplinger |
||
|
|
![]() |