Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 162
Posts: 162   Pages: 17   [ Previous Page | 5 6 7 8 9 10 11 12 13 14 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1941124 times and has 161 replies Next Thread
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU Beta Test - Feb 27 2021 [ Issues Thread ]

Back when the HCC GPU app was in use Nvidia was lagging far behind AMD when it came to OpenCl implementation because they were developing their CUDA app. Their DP capabilities were also less then half that of AMD except on their very high end industrial cards. I haven't kept up with all that but hopefully they have moved their OpenCl development along to be more useful. That being said it would not surprise me that some of their cards will not run this app even if claimed they were 1.2 compatible.
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


[Mar 2, 2021 4:03:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1323
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU Beta Test - Feb 27 2021 [ Issues Thread ]

On a low low end card: OpenCL: AMD/ATI GPU 0: AMD Radeon(TM) R3 Graphics (Mullins)

BETA_OPNG_0021068_00061 - 54 ligands processed without interruption
CPU time 1183 seconds - Elapsed time 5.25 hours

BETA_OPNG_0021068_00001 - 68 ligands processed restarted twice from saved checkpoints
First restart just to test. Second restart because of graphics driver crashed and although task kept running, there was no real progress.
Since I've checkpoint_debug enabled in cc_config.xml, I easily discovered no progress during job 41.
Cpu time 1952 seconds - Elapsed time 5.19 hours

BETA_OPNG_0021068_00007 - 46 ligands processed without interruption
CPU time 1082 seconds - Elapsed time 4.12 hours

BETA_OPNG_0021068_00012 - 65 ligands just started . . .
[Mar 2, 2021 4:05:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
bozz4science
Advanced Cruncher
Germany
Joined: May 3, 2020
Post Count: 104
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU Beta Test - Feb 27 2021 [ Issues Thread ]

Yes. I am running three per GPU on AMD Radeon HD 7990 rig with 8 GPUs. No failures.

Perfect. As soon as the production ready version of the GPU is deployed, I will give it a try by starting with 2 concurrent WUs at a time.

Yes. It won't help it. I would never EVER overclock on a critical scientific project like this one.

Upon reflection on my prior question, I see this now more differentiated than before. Especially as the runtimes are really short and the sub-tasks that place the short few-seconds long 100% load bursts on the GPU are really not the best use case for OC to increase WU throughput. By letting 2 WUs (or any n>1 WU) compute concurrently, I think that the GPU load can be more evenly balanced and sustained at a high rate.

I would never overclock like crazy, but usually apply only a memory clock offset to revert the penalty that NVIDIA places on compute workloads in P0 power state to the memory clock. Ever so slightly, I might slightly adjust the OC on core clock upwards to see what that does to runtimes. I think GPU OC is also a common practice on other GPU-distributed computing projects, no?

And out of curiosity. If someone were to OC their GPUs like crazy based on some benchmark testing for gaming let's say for the sake of the argument, that won't ever turn out to be stable for a GPU-compute workload, wouldn't those WUs result in an error and be assigned an 'invalid' flag by the WU validator?

The researchers are planning to send out more difficult ligands for GPU which from my recommendation should sit around 5 minutes per ligand on average.

That's awesome. Looking forward to see my GPUs sweat.

What is being written is very small amount of data, this should not wear out an SSD.
That's reassuring. Didn't really check for that when my last beta WUs were computed.
----------------------------------------

AMD Ryzen 3700X @ 4.0 GHz / GTX1660S
Intel i5-4278U CPU @ 2.60GHz
----------------------------------------
[Edit 2 times, last edit by bozz4science at Mar 2, 2021 4:59:42 PM]
[Mar 2, 2021 4:53:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU Beta Test - Feb 27 2021 [ Issues Thread ]

Going to agree with Jim 1348 about overclocking the GPUs. Very little gain if any for the increased power consumption and heat. Experience says there will be errors.

I would never overclock like crazy, but usually apply only a memory clock offset to revert the penalty that NVIDIA places on compute workloads in P0 power state to the memory clock. Ever so slightly, I might slightly adjust the OC on core clock upwards to see what that does to runtimes. I think GPU OC is also a common practice on other GPU-distributed computing projects, no?

None that I know of currently.

And out of curiosity. If someone were to OC their GPUs like crazy based on some benchmark testing for gaming let's say for the sake of the argument, that won't ever turn out to be stable for a GPU-compute workload, wouldn't those WUs result in an error and be assigned an 'invalid' flag by the WU validator?


They would be flagged as errors not invalids. Tasks flagged as invalid come from those tasks being compared to known valid tasks and the results don't match. Even if flagged as invalid those tasks run until completion. A task that errors out will stop at the error point and not complete.
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


----------------------------------------
[Edit 1 times, last edit by nanoprobe at Mar 2, 2021 5:28:18 PM]
[Mar 2, 2021 5:23:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU Beta Test - Feb 27 2021 [ Issues Thread ]

Hello all,

I have turned on the validator. I noticed almost instantly that there was an issue, however about 197 results were marked as invalid incorrectly. I stopped the validation, fixed the bug and started it up again. I have marked the results that were invalid to be rerun for validation and they should clean up. If you notice any weirdness on your results, please bring them to my attention so I can review the logs and the results.

Thanks,
-Uplinger
[Mar 2, 2021 5:32:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
bozz4science
Advanced Cruncher
Germany
Joined: May 3, 2020
Post Count: 104
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU Beta Test - Feb 27 2021 [ Issues Thread ]

Thanks for keeping us up to date Uplinger! Curious to see if all my PV WUs will change their status to valid.

Over at GPU Grid there is a vivid exchange of all things 'overclocking'. Comparing risks and benefits, personal experiences, best practices, etc.
It is in another context, sure, but it is still a GPU computing project. Some of the older crunchers over there who have been participating for years do overclock by means of at least setting an offset for the memory clock with NVIDIA cards to get back to "effectively" the default stock memory clock settings. (Due to the penalty received by the switched power state)
----------------------------------------

AMD Ryzen 3700X @ 4.0 GHz / GTX1660S
Intel i5-4278U CPU @ 2.60GHz
[Mar 2, 2021 5:40:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2214
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU Beta Test - Feb 27 2021 [ Issues Thread ]

At the moment not so concentrated on WCG. I noticed that I had lots of dustbunnies in the corners, so for now it's time to grab the vacuum cleaner. And while I'm at it, I might as well mop the floors too. crying

Back later.

Edit, added before I start.

How can this one of mine be marked as "Too Late", when I finished it very much in time, and all the others errored out on the WU?

https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=538631913
----------------------------------------
[Edit 1 times, last edit by Grumpy Swede at Mar 2, 2021 6:00:44 PM]
[Mar 2, 2021 5:55:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1323
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU Beta Test - Feb 27 2021 [ Issues Thread ]

I noticed almost instantly that there was an issue, however about 197 results were marked as invalid incorrectly. I stopped the validation, fixed the bug and started it up again.
Thanks,
-Uplinger

I suppose from these In(in)valids resends were sent out asap . . .
[Mar 2, 2021 6:15:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU Beta Test - Feb 27 2021 [ Issues Thread ]

I noticed almost instantly that there was an issue, however about 197 results were marked as invalid incorrectly. I stopped the validation, fixed the bug and started it up again.
Thanks,
-Uplinger

I suppose from these In(in)valids resends were sent out asap . . .


Yes there were some. Couldn't stop them fast enough.

Thanks,
-Uplinger
[Mar 2, 2021 6:54:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Vester
Senior Cruncher
USA
Joined: Nov 18, 2004
Post Count: 325
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU Beta Test - Feb 27 2021 [ Issues Thread ]

I have 11 pending validation because the other computers have not returned their results.

BETA_OPNG_0021099_00068_1 4HD7990 Pending Validation 3/2/21 01:30:03 3/2/21 02:08:58 0.41 / 0.44 2.6 / 0.0
BETA_OPNG_0021077_00214_1 4HD7990 Pending Validation 3/2/21 00:47:42 3/2/21 01:16:13 0.04 / 0.23 2.6 / 0.0
BETA_OPNG_0021073_00163_1 4HD7990 Pending Validation 3/2/21 00:39:24 3/2/21 01:06:50 0.06 / 0.28 2.6 / 0.0
BETA_OPNG_0021073_00181_1 4HD7990 Pending Validation 3/2/21 00:39:24 3/2/21 01:02:39 0.04 / 0.21 2.6 / 0.0
BETA_OPNG_0021067_00122_0 4HD7990 Pending Validation 3/2/21 00:28:59 3/2/21 00:47:41 0.05 / 0.22 2.6 / 0.0
BETA_OPNG_0021064_00042_0 4HD7990 Pending Validation 3/2/21 00:22:42 3/2/21 00:37:21 0.04 / 0.22 2.6 / 0.0
BETA_OPNG_0021064_00036_0 4HD7990 Pending Validation 3/2/21 00:22:41 3/2/21 00:33:08 0.03 / 0.14 2.6 / 0.0
BETA_OPNG_0021039_00300_1 4HD7990 Pending Validation 3/1/21 23:29:25 3/1/21 23:35:36 0.02 / 0.10 494.7 / 0.0
BETA_OPNG_0021036_00192_1 4HD7990 Pending Validation 3/1/21 23:17:23 3/1/21 23:31:30 0.03 / 0.14 2.6 / 0.0
BETA_OPNG_0021036_00107_1 4HD7990 Pending Validation 3/1/21 23:17:23 3/1/21 23:33:32 0.04 / 0.19 2.6 / 0.0
BETA_OPNG_0021034_00153_0 4HD7990 Pending Validation 3/1/21 23:10:54 3/1/21 23:23:39 0.04 / 0.20 2.6 / 0.0
----------------------------------------

[Mar 2, 2021 7:30:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 162   Pages: 17   [ Previous Page | 5 6 7 8 9 10 11 12 13 14 | Next Page ]
[ Jump to Last Post ]
Post new Thread