Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 781
|
![]() |
Author |
|
widdershins
Veteran Cruncher Scotland Joined: Apr 30, 2007 Post Count: 674 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Just enough time for dinner before hitting that retry button 100 times per second.
![]() |
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Quick update about the status of the stress test. Over the weekend - (mostly later in the day yesterday), large numbers of batches started to complete. This allowed us to start packaging and sending them back to the researchers. However, the packaging phase is fairly IO intensive and that has induced additional load on the filesystem. This has resulted in the slow-downs you have seen. We have been digging into that yesterday and today to see if we can do anything to increase the throughput. We have made some adjustments to the configuration of the clustered filesystem that we hope should help but we don't expect a dramatic improvement.
The current outage was caused by what should have been a quick restart of the filesystem after making the configuration changes. Unfortunately, there were some hung processes from last monday that were still stick and we were not able to cleanly shutdown the cluster. As a result, when we started the cluster back up there were a couple of nodes that had been "kicked out" and the system has to go through a filesystem scan before we can bring the system back online. Once we are back up we will see if the changes are an improvement. |
||
|
True54Blue
Advanced Cruncher Joined: Nov 17, 2004 Post Count: 97 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
My last ten results are showing error as status. Is this a result of the server or has something gone terribly wrong with my computer? I'm noticing that others who returned those jobs at the same time are also showing error and now they're being sent out again.
----------------------------------------![]() |
||
|
spRocket
Senior Cruncher Joined: Mar 25, 2020 Post Count: 274 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
I think I might bump up the queue length on my main cruncher once this all passes. I've already exhausted all of my GPU tasks about an hour and a half ago, and the remaining CPU tasks in the queue are rapidly dwindling.
It's a tradeoff, though - longer queues mean slower turnaround time for units. |
||
|
True54Blue
Advanced Cruncher Joined: Nov 17, 2004 Post Count: 97 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I see some of them are being sent 7 times now. e.g.
----------------------------------------OPNG_0020000_00103 OPNG_0026809_00108 ![]() |
||
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2158 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Well, all my tasks are uploaded and reported now. Getting replacements though, seems at the moment not possible.
|
||
|
Jorlin
Advanced Cruncher Deutschland Joined: Jan 22, 2020 Post Count: 89 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Well, all my tasks are uploaded and reported now. Getting replacements though, seems at the moment not possible. Not getting NVIDIA jobs, but Intel are coming in. ![]() |
||
|
DennyInDurham
Cruncher USA Joined: Aug 4, 2020 Post Count: 23 Status: Offline Project Badges: ![]() ![]() ![]() |
Well, all my tasks are uploaded and reported now. Getting replacements though, seems at the moment not possible. Yes, it would seem the filesystem restart didn't help much... apparently the Stress Test has found another bottleneck. |
||
|
spRocket
Senior Cruncher Joined: Mar 25, 2020 Post Count: 274 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
Got a few new CPU tasks not long ago, but uploads are spotty. Had over a page of them that were backed off past three hours that I just restarted, and I'm now back to the "some work, some don't" situation.
----------------------------------------EDIT: Had to restart transfers on one of my Raspberry Pis as well. EDIT 2: For a while, CPU work was going smoothly, and I got a few GPU units, but it was trying for more and not getting any. Now I just saw a bunch more GPU WUs coming in. I wonder just how many teraflops (petaflops?) we're dumping into the project? ![]() [Edit 2 times, last edit by spRocket at May 3, 2021 10:03:15 PM] |
||
|
cehunt
Senior Cruncher CANADA Joined: Oct 10, 2011 Post Count: 172 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi:
I have a system which has an Intel i7-8700K CPU and a NVIDIA GeForce GTX 1070 GPU. I am interested in getting more bang for my buck. On the task page, it is showing 0.929 CPU + 1 GPU when the GPU is crunching. Can I change the GPU setting to .125 and therefore increase the number of GPU WUs that the GPU is crunching on? Clive |
||
|
|
![]() |