World Community Grid - View Thread - OpenPandemics GPU is LIVE!

World Community Grid Forums

Category: Active Research

Forum: OpenPandemics - COVID-19 Project

Thread: OpenPandemics GPU is LIVE!

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 77

[ ]

Author

This topic has been viewed 240221 times and has 76 replies

nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

10 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

5 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

20 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

5 year badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

20 year badge for Uncovering Genome Mysteries

50 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

20 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: OpenPandemics GPU is LIVE!

I don't know why WCG didn't test that app but it exists and it's on their github.

Best guess would be the researchers never sent tasks for the CUDA app to beta test.

----------------------------------------

In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.

[Apr 8, 2021 5:29:41 PM]

uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:

10 year badge for Human Proteome Folding

2 year badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

20 year badge for Nutritious Rice for the World

2 year badge for The Clean Energy Project

5 year badge for Help Fight Childhood Cancer

2 year badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

10 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

10 year badge for Drug Search for Leishmaniasis

20 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

50 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

100 year badge for FightAIDS@Home - Phase 2

50 year badge for Microbiome Immunity Project

50 year badge for OpenPandemics - COVID-19


Re: OpenPandemics GPU is LIVE!

There are a few reasons the CUDA version was not tested or used.

From early on, having only one version to maintain is a lot easier for testing and general support. Thus, the focus on OpenCL allows all 3 GPU types to participate instead of just one.

Due to this focus, the OpenCL version is actually faster than the CUDA version.

Thanks,
-Uplinger

[Apr 8, 2021 5:48:10 PM]

Ian-n-Steve C.
Senior Cruncher
United States
Joined: May 15, 2020
Post Count: 180
Status: Offline
Project Badges:

90 day badge for OpenPandemics - COVID-19


Re: OpenPandemics GPU is LIVE!

I totally understand ease of support having to only manage and maintain a single code base. but I don't think you can definitively say that the OpenCL version is faster than the CUDA version if you never even tested it.

if you mean "faster" in terms of faster completion of the entire project by considering having a larger user-base, and not just that the nvidia-opencl is faster than nvidia-cuda (which i would strongly contest), then you'd need to weigh the speedup of CUDA vs how many AMD and Intel devices would cease to contribute. do you have any statistics to share about what percentage of total flops the project sees from each device type? without knowing that, you really can't make that kind of conclusion.

if you don't want to complicate your support efforts with two app versions for different device types, like I said, I understand. but the bigger issue a lot of us have is with the constant 0-100 behavior. and that can be better optimized. I hope the team is considering improvements rather than taking the "it's good enough" approach.

----------------------------------------

EPYC 7V12 / [5] RTX A4000
EPYC 7B12 / [5] RTX 3080Ti + [2] RTX 2080Ti
EPYC 7B12 / [6] RTX 3070Ti + [2] RTX 3060
[2] EPYC 7642 / [2] RTX 2080Ti

[Apr 8, 2021 6:39:57 PM]

Bryn Mawr
Senior Cruncher
Joined: Dec 26, 2018
Post Count: 346
Status: Offline
Project Badges:

14 day badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

14 day badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: OpenPandemics GPU is LIVE!

I totally understand ease of support having to only manage and maintain a single code base. but I don't think you can definitively say that the OpenCL version is faster than the CUDA version if you never even tested it.

There are many layers of testing involved before you get to beta and one of those is performance testing so yes, they would know definitively which is faster.

[Apr 8, 2021 8:39:32 PM]

mhammond
Advanced Cruncher
USA
Joined: Dec 22, 2011
Post Count: 130
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding - Phase 2

180 day badge for Help Fight Childhood Cancer

180 day badge for Help Cure Muscular Dystrophy - Phase 2

180 day badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

10 year badge for Uncovering Genome Mysteries

10 year badge for Outsmart Ebola Together

10 year badge for Smash Childhood Cancer

10 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: OpenPandemics GPU is LIVE!

I was really looking forward to GPU running tasks on my laptops but I will be opting out after trying it. Over the last few days, ever since running on GPU, my laptop frequently and without warning powers off. And it isn't crashing either; when restarting it does not prompt with the usual safe-mode option that are offered with unexpected power downs. I have 2 Africa rainfall tasks that have been running for almost 3 days now due to their infrequent check points combined with the shut downs and my last two days results have been lower than any 2 day period for years.

I looked for anyone else reporting this issue and didn't see it; don't have time to weed through everything looking and not sure if this is the best place or thread but wanted to get it out there.

Will gladly and happily try again in the future.

regards,
mike

----------------------------------------

[Apr 8, 2021 8:49:44 PM]

spRocket
Senior Cruncher
Joined: Mar 25, 2020
Post Count: 277
Status: Offline
Project Badges:

100 year badge for Mapping Cancer Markers

1 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project


Re: OpenPandemics GPU is LIVE!

if you don't want to complicate your support efforts with two app versions for different device types, like I said, I understand. but the bigger issue a lot of us have is with the constant 0-100 behavior. and that can be better optimized. I hope the team is considering improvements rather than taking the "it's good enough" approach.

I managed to catch my system running a batch of GPU units, and saw that when one finished, the next immediately started. This is on a GTX 960, for what it's worth. Interestingly, I see 100% GPU utilization, but less than half of the maximum power consumption, and the fan loafs along at 22%, with units completing in 4-6 minutes or so. I'm not overclocking it, or at least I'm not trying to do so.

ETA: I recall seeing about 90-95 W back when I was running GPUGRID.

My nvidia-smi output with an OPNG work unit running:


|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 960     Off  | 00000000:0E:00.0  On |                  N/A |
| 22%   68C    P2    74W / 160W |    216MiB /  4043MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3307      G   /usr/lib/xorg/Xorg                 16MiB |
|    0   N/A  N/A      3665      G   /usr/bin/gnome-shell                8MiB |
|    0   N/A  N/A      4522      C   ...ux-gnu__opencl_nvidia_102      130MiB |
|    0   N/A  N/A      7125      G   /usr/lib/xorg/Xorg                 53MiB |
+-----------------------------------------------------------------------------+

----------------------------------------
[Edit 1 times, last edit by spRocket at Apr 8, 2021 9:00:42 PM]

[Apr 8, 2021 8:54:24 PM]

PMH_UK
Veteran Cruncher
UK
Joined: Apr 26, 2007
Post Count: 772
Status: Offline
Project Badges:

1 year badge for Discovering Dengue Drugs - Together

2 year badge for Nutritious Rice for the World

1 year badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

180 day badge for Influenza Antiviral Drug Search

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria


Re: OpenPandemics GPU is LIVE!

My laptop goes into standby when running GPU units if the lid is closed.
(I use external monitor via KVM switch).
When not running GPU units in runs OK.

Paul.

----------------------------------------

Paul.

[Apr 8, 2021 9:14:17 PM]

Jorlin
Advanced Cruncher
Deutschland
Joined: Jan 22, 2020
Post Count: 89
Status: Offline
Project Badges:

180 day badge for Smash Childhood Cancer

2 year badge for OpenPandemics - COVID-19


Re: OpenPandemics GPU is LIVE!

The 960 is already a very cool running card.
Yes, the power consumption / heat production on these WUs is quite low.
Running two simultaneously on a GTX960 and three on a 1050Ti. Core at 100% but running cooler than a single Primegrid task which i cut off of 50% cpu ressources (so it's not running at 100% Core since it has to wait for the cpu to feed it).

----------------------------------------

[Apr 8, 2021 9:25:25 PM]

Ian-n-Steve C.
Senior Cruncher
United States
Joined: May 15, 2020
Post Count: 180
Status: Offline
Project Badges:


Re: OpenPandemics GPU is LIVE!

Nvidia-smi only provides instantaneous output. Watch the load fluctuation with nvidia settings, or use nvidia-smi with arguments to poll at a higher rate and dump the output to a file. You’ll see GPU load fluctuating from 0-100% constantly.

From what others have mentioned this because the WUs are prepackaged with multiple WUs in a single package. If this is the case, they could use something like a mutex lock to preload data and prepare for the next computation while the current one is ongoing. This will allow the GPU to remain at near 100% the entire time. We did this with the SETI CUDA app and recorded as low as 1ms (probably the limits of our measurement ability) between one WU and the next.

----------------------------------------

EPYC 7V12 / [5] RTX A4000
EPYC 7B12 / [5] RTX 3080Ti + [2] RTX 2080Ti
EPYC 7B12 / [6] RTX 3070Ti + [2] RTX 3060
[2] EPYC 7642 / [2] RTX 2080Ti

[Apr 8, 2021 9:32:43 PM]

bozz4science
Advanced Cruncher
Germany
Joined: May 3, 2020
Post Count: 104
Status: Offline
Project Badges:

90 day badge for Africa Rainfall Project

180 day badge for OpenPandemics - COVID-19


Re: OpenPandemics GPU is LIVE!

Yeah, that's indeed how the 0-100 fluctuations are caused. You can easily validate this by looking on your results page then navigate --> drop down menue projects --> Open Pandemics --> click on the "VALID" hyperlink on one of those computed OPNG tasks and you'll see that multiple jobs are packed in one WU along with the runtime for each with a short log in between.

----------------------------------------

AMD Ryzen 3700X @ 4.0 GHz / GTX1660S
Intel i5-4278U CPU @ 2.60GHz

[Apr 8, 2021 10:12:20 PM]

[ ]