Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 14
|
![]() |
Author |
|
Bryn Mawr
Senior Cruncher Joined: Dec 26, 2018 Post Count: 344 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I guess that the time has finally come to turn my GPU off.
Whilst it would churn through one of the OPNG WUs in about 4-5 hours, since the restart it’s failing them all within about a second. https://www.worldcommunitygrid.org/contribution/workunit/282352221 Error: boinc_get_opencl_ids() failed with error -1 I had not realised that the tasks would be changing during the interim. |
||
|
Aperture_Science_Innovators
Advanced Cruncher United States Joined: Jul 6, 2009 Post Count: 139 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I don't know the details of your GPU, but I was facing this on one of my systems a few months back. It turned out that the power supply in the computer wasn't up to the task, and when the GPU loaded up with the OPNG WUs, the computer wouldn't fully hang, but the screen would go black, and the GPU would become nonfunctional until a restart. New(er) power supply sorted it out, and it runs properly now.
----------------------------------------I *think* I've also had this happen when I try to remote into the systems (with RDP, to either Windows or Linux hosts). It seems to do something with the GPU rendering sent over the network that messes up local access to the GPU :/ I don't have enough firm data to claim this concretely though. ![]() |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1948 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I guess that the time has finally come to turn my GPU off. What kind of GPU are you running? On what OS? Any recent updates, either OS or GPU drivers?Whilst it would churn through one of the OPNG WUs in about 4-5 hours, since the restart it’s failing them all within about a second. https://www.worldcommunitygrid.org/contribution/workunit/282352221 Error: boinc_get_opencl_ids() failed with error -1 I had not realised that the tasks would be changing during the interim. My programming laptop, with an NVidia GeForce GTX 1060, processes the latest batch of OPNG WUs in about 25 min each. Successfully... Ralf ![]() |
||
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2154 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
My old but still working GTX660M crunches these 80+ "jobs" WU's, in a little bit over 6 hours.
----------------------------------------Example: https://www.worldcommunitygrid.org/contribution/workunit/280054486 Still worth running it since the BOINC credit /WCG points those WU's gives/hour, is far higher than an extra CPU core can produce in the same number of hours. My also old GTX980 Strix, will crunch two 80+ "jobs" OPNG at the same time, in around 15 minutes. But until there are plenty OPNG tasks available, I'll let that computer rest. [Edit 4 times, last edit by Grumpy Swede at Apr 14, 2023 6:32:34 PM] |
||
|
Bryn Mawr
Senior Cruncher Joined: Dec 26, 2018 Post Count: 344 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I guess that the time has finally come to turn my GPU off. What kind of GPU are you running? On what OS? Any recent updates, either OS or GPU drivers?Whilst it would churn through one of the OPNG WUs in about 4-5 hours, since the restart it’s failing them all within about a second. https://www.worldcommunitygrid.org/contribution/workunit/282352221 Error: boinc_get_opencl_ids() failed with error -1 I had not realised that the tasks would be changing during the interim. My programming laptop, with an NVidia GeForce GTX 1060, processes the latest batch of OPNG WUs in about 25 min each. Successfully... Ralf OK, so I’ve started to investigate rather than just react. Both of my machines are Ryzen 3900 running Ubuntu 22.04.2 and Boinc 7.20.5 fitted with GT710 GPUs. One has 64gb ram whilst the other has 16gb. It appears that only one of them is downloading and attempting to run OPNG and that’s the smaller one which has been on holiday until a couple of weeks ago. When I restarted it I had ram problems and had to clean and reseat the dimms. It is possible that I’ll have to do the same with the GPU. The other one, that’s been running throughout, has lost the OpenCL driver, I’ll reload that in the morning and see if that works OK. |
||
|
Bryn Mawr
Senior Cruncher Joined: Dec 26, 2018 Post Count: 344 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
So of course, as soon as I say all are failing one runs through to completion! It took 11 hours but it got there in the end.
I’ve reloaded the Nvidia drivers on the other machine and confirmed that OpenCL is now running so it’s now just a case of waiting until both machines download more OPNG to see what happens. |
||
|
Bryn Mawr
Senior Cruncher Joined: Dec 26, 2018 Post Count: 344 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
So both machines are now downloading and successfully running tasks in just over 11 hours.
However, about 15-20% of the tasks fail at 7.88 hours with time limit exceeded. Is there any reason why some tasks have a shorter time limit - more importantly, is there any way of predicting which jobs will be affected? |
||
|
bfmorse
Senior Cruncher US Joined: Jul 26, 2009 Post Count: 296 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Is the “time limit exceeded” based solely on processing time or has the WU’s “Deadline” time been passed?
If it is the latter, you might check your queue values to make sure that setting is not too high causing the WU’s to sit idle until it is their turn to be processed. Meanwhile, the clock tics on and expires. |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2153 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Bryn Mawr:
So both machines are now downloading and successfully running tasks in just over 11 hours. However, about 15-20% of the tasks fail at 7.88 hours with time limit exceeded. Is there any reason why some tasks have a shorter time limit - more importantly, is there any way of predicting which jobs will be affected? Read through this thread and you might be able to understand and correct the problem on your machine. Adri |
||
|
Bryn Mawr
Senior Cruncher Joined: Dec 26, 2018 Post Count: 344 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Bryn Mawr: So both machines are now downloading and successfully running tasks in just over 11 hours. However, about 15-20% of the tasks fail at 7.88 hours with time limit exceeded. Is there any reason why some tasks have a shorter time limit - more importantly, is there any way of predicting which jobs will be affected? Read through this thread and you might be able to understand and correct the problem on your machine. Adri No, purely based on the runtime. |
||
|
|
![]() |