Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 11
|
![]() |
Author |
|
Platoon
Advanced Cruncher Russia Joined: Jun 28, 2006 Post Count: 62 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hello everyone!
----------------------------------------Sorry for long explanation but I have no idea how to describe this problem in less words :) Some time ago (end of September) my main computer started to hangs. System is Win10 Pro, nothing changed in the system, no programs were installed, no hardware changed, and for no reason, dead hangs began. It looks like this: an "hourglass" appears on the screen in the form of a circle-cursor, and it reacts to the mouse, but the system does not react to anything else. Scheduler with Ctr-alt-del also does not appear, there is no actual reaction to any of the keyboard shortcuts, only reset helps. In addition CPU utilization is dropping to zero (RPM on CPU fan is decreasing to minimum). I've tried many ways to diagnose the problem and did several manipulations with HW, such as: - checked RAM with memtest (no errors) - replaced dims in other slots - temperature on CPU is normal (~70c) - tested SSD with tools - no errors - tested system with antivirus - updated all drivers (MotherBoard, SSD controller, GPU) - installed new BIOS version - replaced power unit - Windows system event log don't show any errors or warnings etc. Nothing helped. - I even tried to reinstall Win10 several times (including installation on other SSD drive) but the problem still exist. After some time I managed to discover that this problem appearing only then Africa Rainfall Project is running. To be exact - computer is hanging if >2 tasks of ARP is running - just add one more task and after some time the system is freezed. Any idea why it happening? Maybe it could be some floating problem with memory or something (but why Memtest don't show any problems?)... Anyway HW monitor is showing that not more than 16-20Gb of RAM is used at any time. Now I manually put a limit of ARP tasks =2, so this problem is not bothering me anymore but still. HW configuration: Gigabyte x470 aorus gaming 5wifi AMD ryzen 9 3900x 32gb RAM (4x8gb 2666 ballistix) Nvidia rtx 2080ti Drives: Samsung nvme 970 pro 512gb - System, WCG Intel SSD sc2cw 180gb a3 Sata: Wdc wd20purx 2tb Sata: St3 500gb 630as Power: Bequiet Dark Power pro11 850W
" forever forge ahead and keep the dream in sight!"
![]() |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7667 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I think you are running into the problem where the work units are all competing for the same resource over the same path at the same same time which would cause a system lock. Just what that resource is, I don't know. One clue is the system ran fine until September and then you started having problems. I would suspect one of the many Win 10 updates changed something.
----------------------------------------One thing you could try is ditch Windows 10 and run Linux and see if the problem recurs. You could set it up as a dual boot system or you could just install Linux on your other drive or you could just run Linux off of a flash drive and test it for a while. Good luck Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12386 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Platoon
Has this happened after a restart? If so, it could be that because the units restart at the last checkpoint there could be a capacity problem at the next checkpoint if they all try to checkpoint at the same time. You could try suspending some for varying amounts of time so the checkpointing is spread. You should be able to run arp on half of your threads. Mike |
||
|
sam6861
Advanced Cruncher Joined: Mar 31, 2020 Post Count: 107 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Test CPU with these, does it pass with no errors no freeze?
- Prime95 torture test using Small FFT for about an hour. - intelBurnTest software with 100 runs. If this fails then either the CPU is faulty defective, or the CPU can use slightly more voltage in motherboard BIOS settings. My old intel i7-2600 needed +0.05 core voltage offset for stability. |
||
|
Platoon
Advanced Cruncher Russia Joined: Jun 28, 2006 Post Count: 62 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thank you for replies!
----------------------------------------Some additional info from me: - I've already tried to install several versions of Win10 with different update packs - I don't think there is a problem with CPU as: a) it runs with no problems with 100% utilization on other WCG projects and b) I've used AIDA64 doing overnight stability tests for CPU and RAM - no problems found - As I mentioned before, problem showed suddenly, no changes in HW or SW was made. System worked with ARP tasks = 16. Previously, after a purchase of new Ryzen ~1 year ago, I tested the system with different number of ARP tasks and choose 16 as optimal. - I've tried to slightly add voltage to CPU core to see if something changes - no effect - I reinstalled Win10 several times as well as WCG itself (problem still showing on clear system), so I don't think it's related to unit restart point, checkpoints in ARP tasks, WCG restarts etc. - Swap file is set as system selected (system recommended is 4985Mb) - Someone on HW forum pointed that it could be some problem with my old power supply (it was purchased about ~10yr, so capacitors or something could degrade) and I removed it with a new one. All other components of the system is relatively new (0-2yrs) I believe there could be following reasons: - some HW problem with RAM or Motherboard is showed up on my system but only appearing together with high RAM utilization (but it's strange that HW stress test can't simulate this). - there could be some changes in ARP code or updates in WCG which has some deep hidden compatibility issues with my specific HW - demons or evil aliens? ![]() Good idea to see what happening on Linux, but unfortunately I can't do it right now, maybe some time later. Any idea if it could help running WCG with ARP on virtual machine or container? But I never worked with such SW before, I would be grateful if there are any suggestions/recommendations.
" forever forge ahead and keep the dream in sight!"
![]() |
||
|
TonyEllis
Senior Cruncher Australia Joined: Jul 9, 2008 Post Count: 261 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Have you tried bumping up the memory voltage slightly?
----------------------------------------Have you tried relaxing memory timings? How long did you run the memory tests? Had a couple of cases where sightly bumping the memory voltage solved intermittent hangs, even though memory tests ran clean.
Run Time Stats https://grassmere-productions.no-ip.biz/
|
||
|
sam6861
Advanced Cruncher Joined: Mar 31, 2020 Post Count: 107 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It looks like this: an "hourglass" appears on the screen in the form of a circle-cursor, and it reacts to the mouse, but the system does not react to anything else. My guess could be a problem with SSD or loose connection to SSD for where Windows is installed. Can also be the storage of where the windows swap file is on a different drive letter, as in, swap memory to D or E storage can freeze or crash windows if this storage is faulty.Completely shut off swap file in Windows system settings and restart, does freezing stops? Did you reinstall windows to a different SSD or different storage? Other things to try, Reduce RAM speed to 2400 or lower in motherboard settings. |
||
|
|
![]() |