Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 11
Posts: 11   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2290 times and has 10 replies Next Thread
Platoon
Advanced Cruncher
Russia
Joined: Jun 28, 2006
Post Count: 62
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Computer hangs when ARP tasks more than 2 // Help needed

Hello everyone!
Sorry for long explanation but I have no idea how to describe this problem in less words :)

Some time ago (end of September) my main computer started to hangs.
System is Win10 Pro, nothing changed in the system, no programs were installed, no hardware changed, and for no reason, dead hangs began. It looks like this: an "hourglass" appears on the screen in the form of a circle-cursor, and it reacts to the mouse, but the system does not react to anything else. Scheduler with Ctr-alt-del also does not appear, there is no actual reaction to any of the keyboard shortcuts, only reset helps. In addition CPU utilization is dropping to zero (RPM on CPU fan is decreasing to minimum).

I've tried many ways to diagnose the problem and did several manipulations with HW, such as:
- checked RAM with memtest (no errors)
- replaced dims in other slots
- temperature on CPU is normal (~70c)
- tested SSD with tools - no errors
- tested system with antivirus
- updated all drivers (MotherBoard, SSD controller, GPU)
- installed new BIOS version
- replaced power unit
- Windows system event log don't show any errors or warnings
etc. Nothing helped.
- I even tried to reinstall Win10 several times (including installation on other SSD drive) but the problem still exist.

After some time I managed to discover that this problem appearing only then Africa Rainfall Project is running. To be exact - computer is hanging if >2 tasks of ARP is running - just add one more task and after some time the system is freezed.

Any idea why it happening? Maybe it could be some floating problem with memory or something (but why Memtest don't show any problems?)... Anyway HW monitor is showing that not more than 16-20Gb of RAM is used at any time.

Now I manually put a limit of ARP tasks =2, so this problem is not bothering me anymore but still.

HW configuration:
Gigabyte x470 aorus gaming 5wifi
AMD ryzen 9 3900x
32gb RAM (4x8gb 2666 ballistix)
Nvidia rtx 2080ti
Drives:
Samsung nvme 970 pro 512gb - System, WCG
Intel SSD sc2cw 180gb a3
Sata: Wdc wd20purx 2tb
Sata: St3 500gb 630as
Power: Bequiet Dark Power pro11 850W
----------------------------------------
" forever forge ahead and keep the dream in sight!"

[Dec 28, 2020 2:18:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7667
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Computer hangs when ARP tasks more than 2 // Help needed

I think you are running into the problem where the work units are all competing for the same resource over the same path at the same same time which would cause a system lock. Just what that resource is, I don't know. One clue is the system ran fine until September and then you started having problems. I would suspect one of the many Win 10 updates changed something.
One thing you could try is ditch Windows 10 and run Linux and see if the problem recurs. You could set it up as a dual boot system or you could just install Linux on your other drive or you could just run Linux off of a flash drive and test it for a while.
Good luck
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Dec 28, 2020 2:49:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12386
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Computer hangs when ARP tasks more than 2 // Help needed

Platoon

Has this happened after a restart? If so, it could be that because the units restart at the last checkpoint there could be a capacity problem at the next checkpoint if they all try to checkpoint at the same time. You could try suspending some for varying amounts of time so the checkpointing is spread.

You should be able to run arp on half of your threads.

Mike
[Dec 28, 2020 4:02:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
sam6861
Advanced Cruncher
Joined: Mar 31, 2020
Post Count: 107
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Computer hangs when ARP tasks more than 2 // Help needed

Test CPU with these, does it pass with no errors no freeze?
- Prime95 torture test using Small FFT for about an hour.
- intelBurnTest software with 100 runs.

If this fails then either the CPU is faulty defective, or the CPU can use slightly more voltage in motherboard BIOS settings. My old intel i7-2600 needed +0.05 core voltage offset for stability.
[Dec 28, 2020 6:52:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Platoon
Advanced Cruncher
Russia
Joined: Jun 28, 2006
Post Count: 62
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Computer hangs when ARP tasks more than 2 // Help needed

Thank you for replies!

Some additional info from me:
- I've already tried to install several versions of Win10 with different update packs
- I don't think there is a problem with CPU as: a) it runs with no problems with 100% utilization on other WCG projects and b) I've used AIDA64 doing overnight stability tests for CPU and RAM - no problems found
- As I mentioned before, problem showed suddenly, no changes in HW or SW was made. System worked with ARP tasks = 16. Previously, after a purchase of new Ryzen ~1 year ago, I tested the system with different number of ARP tasks and choose 16 as optimal.
- I've tried to slightly add voltage to CPU core to see if something changes - no effect
- I reinstalled Win10 several times as well as WCG itself (problem still showing on clear system), so I don't think it's related to unit restart point, checkpoints in ARP tasks, WCG restarts etc.
- Swap file is set as system selected (system recommended is 4985Mb)
- Someone on HW forum pointed that it could be some problem with my old power supply (it was purchased about ~10yr, so capacitors or something could degrade) and I removed it with a new one. All other components of the system is relatively new (0-2yrs)


I believe there could be following reasons:
- some HW problem with RAM or Motherboard is showed up on my system but only appearing together with high RAM utilization (but it's strange that HW stress test can't simulate this).
- there could be some changes in ARP code or updates in WCG which has some deep hidden compatibility issues with my specific HW
- demons or evil aliens? alien 2

Good idea to see what happening on Linux, but unfortunately I can't do it right now, maybe some time later.
Any idea if it could help running WCG with ARP on virtual machine or container? But I never worked with such SW before, I would be grateful if there are any suggestions/recommendations.
----------------------------------------
" forever forge ahead and keep the dream in sight!"

[Dec 30, 2020 12:04:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TonyEllis
Senior Cruncher
Australia
Joined: Jul 9, 2008
Post Count: 261
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Computer hangs when ARP tasks more than 2 // Help needed

Have you tried bumping up the memory voltage slightly?
Have you tried relaxing memory timings?
How long did you run the memory tests?

Had a couple of cases where sightly bumping the memory voltage solved intermittent hangs, even though memory tests ran clean.
----------------------------------------
[Dec 30, 2020 12:24:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
sam6861
Advanced Cruncher
Joined: Mar 31, 2020
Post Count: 107
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Computer hangs when ARP tasks more than 2 // Help needed

It looks like this: an "hourglass" appears on the screen in the form of a circle-cursor, and it reacts to the mouse, but the system does not react to anything else.
My guess could be a problem with SSD or loose connection to SSD for where Windows is installed. Can also be the storage of where the windows swap file is on a different drive letter, as in, swap memory to D or E storage can freeze or crash windows if this storage is faulty.

Completely shut off swap file in Windows system settings and restart, does freezing stops?
Did you reinstall windows to a different SSD or different storage?
Other things to try, Reduce RAM speed to 2400 or lower in motherboard settings.
[Dec 30, 2020 7:35:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 11   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread