Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 822
|
![]() |
Author |
|
BladeD
Ace Cruncher USA Joined: Nov 17, 2004 Post Count: 28976 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Someone released some work for the night shift.
----------------------------------------![]() |
||
|
Speedy51
Veteran Cruncher New Zealand Joined: Nov 4, 2005 Post Count: 1297 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I like to see all GPU work units use replication=2 or higher, if possible. Looks like we have lots of GPU waiting for GPU tasks. Sometimes slightly annoyed to see most of my OPNG work units are replication=1 and my GPU is idling 80% of the time. My understanding is if you see work higher than _1 this is not a good sign because it means either there is an issue with the work unit or the work unit has failed on multiple GPU's ![]() |
||
|
goben_2003
Advanced Cruncher Joined: Jun 16, 2006 Post Count: 145 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I like to see all GPU work units use replication=2 or higher, if possible. Looks like we have lots of GPU waiting for GPU tasks. Sometimes slightly annoyed to see most of my OPNG work units are replication=1 and my GPU is idling 80% of the time. My understanding is if you see work higher than _1 this is not a good sign because it means either there is an issue with the work unit or the work unit has failed on multiple GPU's I think that they are referring to when only _0 is required such as: Name: OPNG_0000485_00204 The only ones of mine that are replication of 2 are when I am assigned a _1 as a wingman for a device that is not yet a reliable host. If the science does not need a replication of 2, then setting it to replication of 2 would be sending out busy work. It is bad to send out busy work as it is not actually free for volunteers to contribute(electricity, etc). ![]() |
||
|
kittyman
Advanced Cruncher Joined: May 14, 2020 Post Count: 140 Status: Offline Project Badges: ![]() ![]() ![]() |
Well, the kitties have not seen another OPNG since the single one they got and munched yesterday. So, we'll try again............
----------------------------------------Kitties haz OPNG, pleeze? ![]() |
||
|
sam6861
Advanced Cruncher Joined: Mar 31, 2020 Post Count: 107 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
If the science does not need a replication of 2, then setting it to replication of 2 would be sending out busy work. Replication of 2 allows data to verify that they match. Replication of 1... some bad or broken data can randomly and silently get past it and show up as "Valid" as there are no wingman computer to verify result. GPU Calculation errors sometmes caused by overclock, undervolt, automatic updates to windows drivers, blown capacitor or just failing hardware.Looking at some results containing replication=2 and invalid, there appears to be a bug? When data don't match, 2 more tasks was sent instead of 1 more task, then later 1 goes server abort. I counted only those with replication=2, 50 OPNG work units total, 5 work units have an invalid and was resent. Here are 3 of them. https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=606928354 OPNG_0000133_00275 https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=608243626 OPNG_0000350_00045 https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=607653744 OPNG_0000164_00083 |
||
|
Dennis-TW
Cruncher Joined: Apr 28, 2010 Post Count: 13 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Replication of 2 allows data to verify that they match. Replication of 1... some bad or broken data can randomly and silently get past it and show up as "Valid" as there are no wingman computer to verify result. GPU Calculation errors sometmes caused by overclock, undervolt, automatic updates to windows drivers, blown capacitor or just failing hardware. Please check this statement from Uplinger with regard on validation. FWIW, I think it is more safe to trust the WCG Tech Team in the way they are handling things than the average user like us with no insight at all. |
||
|
sam6861
Advanced Cruncher Joined: Mar 31, 2020 Post Count: 107 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Please check this statement from Uplinger with regard on validation. FWIW, I think it is more safe to trust the WCG Tech Team in the way they are handling things than the average user like us with no insight at all. Ok, so that partly explains about just having replication=1.But is those single invalid double resend, replication=3, and server abort / triple valid a bug? This can possibly go from "yay got a few GPU task" to "No, server aborted". So far, OPNG server abort didn't happen to my computer yet. |
||
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
But is those single invalid double resend, replication=3, and server abort / triple valid a bug? It's not a bug. It's the way the server is set up. An invalid sends out 2 more copies of the task for comparison. First 1 returned that matches is valid. No need for a second so the server aborts it. In some cases both tasks have started before 1 is returned. In that case both the resends can be valid. Hope this clears up the subject.
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
----------------------------------------![]() ![]() [Edit 3 times, last edit by nanoprobe at Apr 8, 2021 1:02:20 PM] |
||
|
sam6861
Advanced Cruncher Joined: Mar 31, 2020 Post Count: 107 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
An invalid sends out 2 more copies of the task for comparison. I guess OPNG is different from ARP1 where their 1 invalid just send 1 more copy.Now on to work unit availability, I do get a few OPNG tasks, about 10 per day, with the use of report_results_immediately in app_config.xml while running CPU tasks to get more work more frequently. [Edit 1 times, last edit by sam6861 at Apr 8, 2021 1:45:27 PM] |
||
|
spRocket
Senior Cruncher Joined: Mar 25, 2020 Post Count: 274 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
Just noticed a few more OPNG units sailed by while I was asleep... not as many as yesterday. I'm glad to see more coming, even if they're few and far between.
|
||
|
|
![]() |