Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Support Forum: Website Support Thread: Percentage of resends seems to high |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 10
|
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Now that MCM has resumed it seems we are back to executing about 50% resends. At any given time it seems that half of the WUs executing have due dates that are 3 days or less. I have at least 4 pages of pending validations that are due to NO REPLY WUs. Seems so inefficient. Just venting. Crunch on.
|
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7219 Status: Offline Project Badges: |
I think a lot of those no reply's are due to the Ripple crowd and their use of cloud computing. If they get bid off of a machine those WU's go off to oblivion and need to be re-issued. I think Uplinger alluded to this phenomenon a while back.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
Byteball_730a2960
Senior Cruncher Joined: Oct 29, 2010 Post Count: 318 Status: Offline Project Badges: |
Didn't Kevin also say that results from the faulty batches would be sent out again? Would they come under umbrella of being resends?
|
||
|
pcwr
Ace Cruncher England Joined: Sep 17, 2005 Post Count: 10903 Status: Offline Project Badges: |
All of my resends are due to "no reply". I would have thought that the faulty batches would have been aborted by the system and not just left to try and complete.
----------------------------------------Patrick |
||
|
jonnieb-uk
Ace Cruncher England Joined: Nov 30, 2011 Post Count: 6105 Status: Offline Project Badges: |
The faulty WUs were withdrawn automatically by the server as indicated in knreed's post .
---------------------------------------- |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Just received a batch of 41 resends on one machine. That's just too crazy. Moving my cache to 3 days so none of my 104 cores will be considered reliable and let someone else clean up the Ripple crowd's mess....
|
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7219 Status: Offline Project Badges: |
I am seeing a fair number of resends also, but I just let them run, my cache is small enough so I don't hit panic mode.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
seippel
Former World Community Grid Tech Joined: Apr 16, 2009 Post Count: 392 Status: Offline Project Badges: |
We are currently investigating this issue which we believe is related to checkpointing on the type of work unit that is being run now (MCM1 has many different work unit types). We will report back when we have more information. Changing the boinc setting to leave the application in memory may help reduce the likelihood of a restart.
Seippel |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I don't think we are talking about the same issue. My concern is the percentage of WUs that miss the return deadline and are resent to the grid with the shortened return deadline. Just seems very inefficient to send work, 50% miss the deadline, and then resend again to a smaller pool of resources with a shortened deadline. Before Ripple, it was about 10 to 15% resends but recently I was executing almost 50% (sometimes higher) resends. My protest is to not have my resources be indicated as reliable. I'm getting almost all new work now..
|
||
|
seippel
Former World Community Grid Tech Joined: Apr 16, 2009 Post Count: 392 Status: Offline Project Badges: |
The percentage of work units that miss the deadline and end up as "no reply" has remained fairly consistent at 3.1%. What may be happening though is that fewer machines are considered reliable due to the checkpointing issue so those machines that are considered reliable are getting a higher percentage of resends.
Seippel |
||
|
|