Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 162
Posts: 162   Pages: 17   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1943369 times and has 161 replies Next Thread
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU Beta Test - Feb 27 2021 [ Issues Thread ]

Good afternoon,

Sorry for the delay on everything. I am working through a few issues on my end.

1. The scheduler was sending out work to Intel OpenCL but executing them as CPU which broke it of course. This was 100% failure rate which should be easy to debug, however, the same (md5sum matches) scheduler is running on our QA environment and I am not seeing the issue there. I am working through what could be causing the differences between the two environments.

2. My build environment for linux was not properly set up and I have since corrected that over the weekend and proved it works, so i'm confident in that environment going forward.

3. Autodock GPU does not allow for element 14 (Silicon) in it's atom types. This was allowed custom for the CPU version but is not needed for GPU. However the mixture of using CPU work units for GPU caused these issues. This was determined by ThreadRipper and his result above. I have changed the application to print this type of error to the stderr so that it is easier to identify going forward. I am going to provide a list of work units that have ligands that reference Si in their pdbqt files. I will be removing these from the database and preventing them from running on this beta test. The researchers will not be including work for production that include Si in the future.

4. The plan_class definition file had another error in it that did not close a tag. This has been corrected on the server.

As you can tell from the 4 major issues from the beta test this weekend, I have a plan going forward, however I will need to explore more into issue 1 as I do not have a firm answer as to how to correct that. My goal is to hopefully have an answer for that today.

Others on the WCG team have reassured me that this is why we run BETA tests, I hope that we can get through these issues and get a solid beta test soon. Thank you for your help on getting the bugs/issues sorted out.

Thanks,
-Uplinger

Bad workunits (These may disappear from the database as I clean them out.)

OPNG_0021002_00048.job
OPNG_0021002_00089.job

OPNG_0021003_00040.job
OPNG_0021003_00069.job
OPNG_0021003_00011.job

OPNG_0021004_00037.job
OPNG_0021004_00074.job
OPNG_0021004_00070.job
OPNG_0021004_00051.job

OPNG_0021005_00008.job

OPNG_0021006_00022.job

OPNG_0021007_00030.job
OPNG_0021007_00075.job
OPNG_0021007_00050.job
OPNG_0021007_00013.job

OPNG_0021008_00004.job
OPNG_0021008_00060.job
OPNG_0021008_00073.job
OPNG_0021008_00015.job

OPNG_0021012_00137.job

OPNG_0021013_00050.job
OPNG_0021013_00080.job
OPNG_0021013_00070.job
OPNG_0021013_00082.job
OPNG_0021013_00137.job
OPNG_0021013_00110.job
OPNG_0021013_00109.job
OPNG_0021013_00060.job
OPNG_0021013_00040.job
OPNG_0021013_00114.job
OPNG_0021013_00027.job
OPNG_0021013_00108.job
OPNG_0021013_00033.job
OPNG_0021013_00136.job
OPNG_0021013_00091.job
OPNG_0021013_00002.job
OPNG_0021013_00078.job
OPNG_0021013_00043.job
OPNG_0021013_00011.job

OPNG_0021014_00176.job
OPNG_0021014_00066.job
OPNG_0021014_00035.job

OPNG_0021015_00045.job
OPNG_0021015_00020.job

OPNG_0021022_00088.job
OPNG_0021022_00077.job
OPNG_0021022_00071.job

OPNG_0021027_00078.job

OPNG_0021042_00076.job

OPNG_0021065_00177.job

OPNG_0021076_00060.job

OPNG_0021079_00084.job

OPNG_0021088_00084.job

OPNG_0021089_00013.job

OPNG_0021095_00207.job
OPNG_0021095_00229.job
[Mar 1, 2021 7:31:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU Beta Test - Feb 27 2021 [ Issues Thread ]

Oh, Forgot to add one more item. Since we are combining many ligands together, we are needing to increase the disk usage and output file sizes going forward. I am going to update the database to help the currently loaded batches, but the configuration change has been made for future work and opng project in general.

Thanks,
-Uplinger
[Mar 1, 2021 8:18:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
koschi
Cruncher
Joined: Dec 16, 2007
Post Count: 5
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU Beta Test - Feb 27 2021 [ Issues Thread ]

Keep the head up, this is why we test :-)

Thanks for the update!
[Mar 1, 2021 8:18:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
rendition54
Master Cruncher
USA
Joined: Aug 16, 2005
Post Count: 2609
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU Beta Test - Feb 27 2021 [ Issues Thread ]

Thanks for the update, Uplinger!
----------------------------------------

[Mar 1, 2021 8:18:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
ThreadRipper
Veteran Cruncher
Sweden
Joined: Apr 26, 2007
Post Count: 1322
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU Beta Test - Feb 27 2021 [ Issues Thread ]

Thanks for the extensive update Uplinger!
That is exactly why we have Beta tests - to test for errors and such so that those errors won't make it into production. I am sure everyone here is glad to help (having opted in to Beta). Of course everyone is very very eager to crunch GPU WUs again since the HCC project which was a very long time ago.

However, just keep testing and going through the errors systematically and I am sure all shall be well - in the end, we're all in this together! :)
----------------------------------------

Join The International Team: https://www.worldcommunitygrid.org/team/viewTeamInfo.do?teamId=CK9RP1BKX1

AMD TR2990WX @ PBO, 64GB Quad 3200MHz 14-17-17-17-1T, RX6900XT @ Stock
AMD 3800X @ PBO
AMD 2700X @ 4GHz
[Mar 1, 2021 8:34:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Jim1348
Veteran Cruncher
USA
Joined: Jul 13, 2009
Post Count: 1066
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU Beta Test - Feb 27 2021 [ Issues Thread ]

I think that is outstandingly good work, and reporting. The beta is really going quite well, as anyone who has been through them can attest.

But at the risk of incurring the wrath of the Intel GPU crunchers, why not just get rid of them, at least temporarily? Isn't there an "off" switch? They will be doing a very minor part of the work anyway.
[Mar 1, 2021 8:37:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU Beta Test - Feb 27 2021 [ Issues Thread ]

@Jim,

That is not out of the thought process for consideration. However, i have seen some pretty decent runtimes from my intel laptop which rivaled it's nvidia graphics card. It should be possible for that to work on many devices which lots of members would appreciate being able to contribute with both if possible.

I'm currently working on recompiling the scheduler with some minor fixes and reporting to help debug issues on more than just intel opencl.

Thanks,
-Uplinger
[Mar 1, 2021 9:04:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2217
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU Beta Test - Feb 27 2021 [ Issues Thread ]

It'll be fine in the end, I'm sure.

However, I'm not sure that the GPU choice settings on the "Device Profiles" really work as they should.

I know that there are no GPU work now, but before the "end" of this latest Beta, when I have "YES" on Intel and NVIDIA, my computer with both GPU's, did ask for work for both. That even worked in between the two Beta runs. (iGPU HD4600. Discrete GPU NVIDIA GTX980 Strix)

Now though, it only asks for Intel work. I've shut it down for now though. It's DeviceId 6750942.
----------------------------------------
[Edit 1 times, last edit by Grumpy Swede at Mar 1, 2021 9:36:13 PM]
[Mar 1, 2021 9:34:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Richard Haselgrove
Senior Cruncher
United Kingdom
Joined: Feb 19, 2021
Post Count: 360
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU Beta Test - Feb 27 2021 [ Issues Thread ]

@GS,
If you wake up that cruncher, go to the projects tab, select WCG, then click 'Properties', you may get some useful clues.

Just tried it on this machine, and I get:
Don't request tasks for CPU -        Project preference
Don't request tasks for NVIDIA GPU - Project has no apps for NVIDIA GPU
Don't request tasks for Intel GPU - Project preference
Last scheduler reply - 28/02/2021 06:37:38
First and third I knew about (I set them!), but second is new to me, and up to the project to sort out as the Beta progresses. Once you see comments here that NV work is flowing again, do a manual update and it should start working again.
----------------------------------------
[Edit 1 times, last edit by Richard Haselgrove at Mar 1, 2021 9:55:05 PM]
[Mar 1, 2021 9:53:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2217
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics GPU Beta Test - Feb 27 2021 [ Issues Thread ]

@GS,
If you wake up that cruncher, go to the projects tab, select WCG, then click 'Properties', you may get some useful clues.

Just tried it on this machine, and I get:
Don't request tasks for CPU -        Project preference
Don't request tasks for NVIDIA GPU - Project has no apps for NVIDIA GPU
Don't request tasks for Intel GPU - Project preference
Last scheduler reply - 28/02/2021 06:37:38
First and third I knew about (I set them!), but second is new to me, and up to the project to sort out as the Beta progresses. Once you see comments here that NV work is flowing again, do a manual update and it should start working again.

Thanks Richard, you're right about that. My results from the "Properties":
(I've set my prefs not to ask for CPU work on that computer.)

Don't request tasks for CPU Project preference
Don't request tasks for NVIDIA GPU Project has no apps for NVIDIA GPU
Intel GPU task request deferred for 00:01:26

----------------------------------------
[Edit 2 times, last edit by Grumpy Swede at Mar 1, 2021 10:14:10 PM]
[Mar 1, 2021 10:11:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 162   Pages: 17   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread