Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 781
Posts: 781   Pages: 79   [ Previous Page | 22 23 24 25 26 27 28 29 30 31 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 544493 times and has 780 replies Next Thread
hnapel
Advanced Cruncher
Netherlands
Joined: Nov 17, 2004
Post Count: 82
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

Good morning,

I should have posted sooner, since I've been up for 2 hours. We are still working on tweaking some of the load balancer values. It seems like things are running a bit smoother right now. We are only in about 20 minutes on the latest changes. Please let us know if you are noticing anything on your end.

Thanks,
-Uplinger


Uploads going smoother now, I've got 4 PC's (with GPU's) running on this project and I just 'retried' all pending uploads and they all went through, it certainly looks quieter on the upload front rn.
[Apr 27, 2021 2:22:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
spRocket
Senior Cruncher
Joined: Mar 25, 2020
Post Count: 274
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

So far, so good here as well. Just turned in a stack of "big" OPNG units and got a stack of mixed units back with no hiccups.
[Apr 27, 2021 2:25:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Michael Goetz
Cruncher
United States
Joined: Dec 11, 2017
Post Count: 35
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test



"c:\program files\boinc\boinccmd.exe" --network_available

Then add loop/repeat controls as appropriate to your desires and scripting language.


Thanks for the tip! does that also retry stalled transfers ?


Yes. That's the whole point. :)

Also, if you're using BOINCTasks, and right click on any file transfers, there's an option labelled "Retry All". There's several BOINC features that, strangely, are not supported by the official BOINC GUI interface but are supported by third party GUIs such as BOINCTasks. This is one of them.
----------------------------------------
[Edit 1 times, last edit by Michael Goetz at Apr 27, 2021 2:28:14 PM]
[Apr 27, 2021 2:26:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Richard Haselgrove
Senior Cruncher
United Kingdom
Joined: Feb 19, 2021
Post Count: 360
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

'Retry Pending Transfers' is available in the official BOINC Manager, but it's a menu item rather than a button.
Advanced view, Tools menu.
[Apr 27, 2021 2:30:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Ian-n-Steve C.
Senior Cruncher
United States
Joined: May 15, 2020
Post Count: 180
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

One Ellesmere with HDD and one Ellesmere with SSD.

The SSD one is crashing very often, because of a lot Checkpoints.
Boinc is set to 1200 sec. for backup, but OPNG ignore this.
Now are the longrunning OPNG-Tasks running on it (1 hour!).
Something is wrong with checkpointing and SSD.
https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=639284992


Either something is wrong with your SSD, or something else is wrong with the system with the SSD. My systems are much faster and running 6-8 GPUs and producing many more writes to the SSD, but with no issues. SSDs in general are capable of many orders of magnitude more IOPs than a HDD. Your problem is likely system-specific, not SSD-specific.

No problem with Einstein@Home!

And Einstein has longer running tasks which might not expose the issues with your SSD. You can’t really compare apples and oranges.

Like I said, I’m processing at a MUCH higher volume on OPNG, with no SSD issues. If it was a generic SSD issue, someone like me with many more writes would see this issue too, but we don’t. That points to your issue being related to something with your system specifically.
----------------------------------------

EPYC 7V12 / [5] RTX A4000
EPYC 7B12 / [5] RTX 3080Ti + [2] RTX 2080Ti
EPYC 7B12 / [6] RTX 3070Ti + [2] RTX 3060
[2] EPYC 7642 / [2] RTX 2080Ti
[Apr 27, 2021 2:31:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

While I have a few minutes, I'm goin gto go back through about 200 posts from everyone :) That may be the biggest bottleneck :P Also, no worries on putting more stress on the system. That's what all this fun is about....

<sarcastically long pause>


.....oh yeah, and the science :P

Thanks,
-Uplinger
[Apr 27, 2021 2:35:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
stevemtu
Cruncher
Joined: Sep 7, 2005
Post Count: 12
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

For the first time since yesterday, I now have no tasks waiting to upload or download. Looking better.
[Apr 27, 2021 2:37:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2167
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

The validator doesn't seem to be interested in validating these larger WU's from batches 13345 - 41773. The previous lower batches were validated very soon after they were finished. No wingman, I'm _0 on all of these WU's, but the validator isn't interested in even trying to validate.

Edit: Why does this "large" WU start at "job" #56? Never seen that before either. All other previous WU's always started at job #1

https://www.worldcommunitygrid.org/ms/device/...og.do?resultId=1657133303

These "new" typ of WU's, do have some very different behaviour.

Edit 2: And the validator still doesn't try to validate them. I wonder if the validator is setup deliberately not to validate these "larger" WU's?
----------------------------------------
[Edit 3 times, last edit by Grumpy Swede at Apr 27, 2021 2:50:46 PM]
[Apr 27, 2021 2:37:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Jim1348
Veteran Cruncher
USA
Joined: Jul 13, 2009
Post Count: 1066
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

Also, no worries on putting more stress on the system. That's what all this fun is about....

I can do that.
[Apr 27, 2021 2:41:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

The validator doesn't seem to be interested in validating these larger WU's from batches 13345 - 41773. The previous lower batches were validated very soon after they were finished. No wingman, I'm _0 on all of these WU's, but the validator isn't interested in even trying to validate.

Edit: Why does this "large" WU start at "job" #56? Never seen that before either. All other previous WU's always started at job #1

https://www.worldcommunitygrid.org/ms/device/...og.do?resultId=1657133303


It shows starting at 56 because BOINC only uploads the last X bytes in the stderr to us. This limits what you see on the website.

As for the validator doing well, I have bumped up to 8 cores and I'm monitoring where it is at.

Thanks,
-Uplinger
[Apr 27, 2021 2:48:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 781   Pages: 79   [ Previous Page | 22 23 24 25 26 27 28 29 30 31 | Next Page ]
[ Jump to Last Post ]
Post new Thread