Random crash/reboots during inactivity

psychedelicbeast

Active member
Hi All
I have finally received my desktop PC from PCSpecialist on Friday. However, I am seeing some issues. ATleast 3-4 times (maybe more) since Friday, my PC has randomly rebooted during sleep mode. The latest one happened 30 minutes back. When I checked Event Viewer, this is whaat I received.

"
A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 20

The details view of this entry contains further information."

While the details were:
1619518581694.png

Honestly have no clue what this means. Any help[would be greatly appreciated. My spec is attached below. I have added an extra storage HD (WD Black 2TB - 2010 edition):
CaseTHERMALTAKE CORE X71 TEMPERED GLASS EDITION GAMING CASE
down_right_arrow.gif
Change to: CORSAIR OBSIDIAN SERIES™ 500D SE CASE
Processor (CPU)AMD Ryzen 9 5950X 16 Core CPU (3.4GHz-4.9GHz/72MB CACHE/AM4)
MotherboardASUS® CROSSHAIR VIII HERO (DDR4, PCIe 4.0, CrossFireX/SLI) - RGB Ready!
Memory (RAM)32GB Corsair VENGEANCE DDR4 3600MHz (4 x 8GB)
down_right_arrow.gif
Change to: 32GB Corsair VENGEANCE DDR4 3600MHz (2 x 16GB)
Graphics Card24GB ASUS ROG STRIX GEFORCE RTX 3090 - HDMI, DP
1st Storage DriveNOT REQUIRED
1st M.2 SSD Drive500GB SAMSUNG 980 PRO M.2, PCIe NVMe (up to 6900MB/R, 5000MB/W)
down_right_arrow.gif
Change to: 1TB SAMSUNG 980 PRO M.2, PCIe NVMe (up to 7000MB/R, 5000MB/W)
2nd M.2 SSD Drive2TB SEAGATE FIRECUDA 520 GEN 4 PCIe NVMe (up to 5000MB/R, 4400MB/W)
DVD/BLU-RAY Drive16x BLU-RAY WRITER DRIVE, 16x DVD ±R/±RW & SOFTWARE
down_right_arrow.gif
Change to: NOT REQUIRED
Memory Card ReaderUSB 3.0 EXTERNAL SD/MICRO SD CARD READER
Power SupplyCORSAIR 850W RMx SERIES™ MODULAR 80 PLUS® GOLD, ULTRA QUIET
down_right_arrow.gif
Change to: CORSAIR 1000W RMx SERIES™ MODULAR 80 PLUS® GOLD, ULTRA QUIET
Power Cable1 x 1 Metre UK Power Cable (Kettle Lead)
Processor CoolingCorsair H115i RGB PLATINUM Hydro Series High Performance CPU Cooler
down_right_arrow.gif
Change to: Corsair H115i ELITE CAPELLIX RGB Hydro Series High Performance CPU Cooler
Thermal PasteSTANDARD THERMAL PASTE FOR SUFFICIENT COOLING
LED Lighting2x 50cm RGB LED Strip
Extra Case Fans3x Corsair LL120 RGB LED Fan + Controller Kit
down_right_arrow.gif
Change to: NONE
Sound CardONBOARD 6 CHANNEL (5.1) HIGH DEF AUDIO (AS STANDARD)
Network Card10/100/1000 GIGABIT LAN PORT (Wi-Fi NOT INCLUDED)
Wireless Network CardWIRELESS INTEL® Wi-Fi 6 AX200 2,400Mbps/5GHz, 300Mbps/2.4GHz PCI-E CARD + BT 5.0
USB/Thunderbolt Options2 PORT (1 x TYPE A, 1 x TYPE C) USB 3.1 PCI-E CARD + STANDARD USB PORTS
Operating SystemWindows 10 Professional 64 Bit - inc. Single Licence [MUP-00003]
down_right_arrow.gif
Change to: Windows 10 Home 64 Bit - inc. Single Licence [KUK-00001]
Operating System LanguageUnited Kingdom - English Language
Windows Recovery MediaWindows 10 Multi-Language Recovery Image - Unlimited Downloads from Online Account
Office SoftwareFREE 30 Day Trial of Microsoft 365® (Operating System Required)
Anti-VirusNO ANTI-VIRUS SOFTWARE
BrowserMicrosoft® Edge (Windows 10 Only)
Warranty3 Year Platinum Warranty (3 Year Collect & Return, 3 Year Parts, 3 Year labour)
DeliverySTANDARD INSURED DELIVERY TO UK MAINLAND (MON-FRI)
 
Last edited:
D

Deleted member 17413

Guest
Firstly, i would turn off sleep mode... can complicate things and cause issues itself.

Have you got any dump files from the crashes? @ubuysa may be able to figure out whats happening.
In the meantime, memtest may be useful, and also a list of anything youve done with it since arrival.

(Checked components/reseated stuff, updates etc)
 

ubuysa

The BSOD Doctor
I rather fear this may be another example of the 4 x 3600MHz RAM in an AMD build issue. I had hoped these problems had been resolved (its an AMD issue not a PCS one).

I would phone PCS ASAP (don't email) and discuss this with them.

Do upload the memory dump to the cloud with a link to it here but I rather fear I know what I'm going to find.

You can try downclocking the RAM until you can get it stable.
 

psychedelicbeast

Active member
I rather fear this may be another example of the 4 x 3600MHz RAM in an AMD build issue. I had hoped these problems had been resolved (its an AMD issue not a PCS one).

I would phone PCS ASAP (don't email) and discuss this with them.

Do upload the memory dump to the cloud with a link to it here but I rather fear I know what I'm going to find.

You can try downclocking the RAM until you can get it stable.
Hi @ubuysa,
For some reason, my updated spec didn't get saved here (I have amended it above). I actually have 2 x 16GB RAM (3600HZ), so the 4 RAM chip issue shouldn't have happened

Also, there is a file called MEMORY.DMP in C:\Windows. Is that the one to upload? It seems to have been updated last at 8:49AM whereas EventViewer shows me a crash at 10:33AM.
 

ubuysa

The BSOD Doctor
Hi @ubuysa,
For some reason, my updated spec didn't get saved here (I have amended it above). I actually have 2 x 16GB RAM (3600HZ), so the 4 RAM chip issue shouldn't have happened

Also, there is a file called MEMORY.DMP in C:\Windows. Is that the one to upload? It seems to have been updated last at 8:49AM whereas EventViewer shows me a crash at 10:33AM.
Yep, that's the file to upload. It will likely be in excess of 1GB so it needs to go to a cloud service with a link here. :)

Also check my second post here and upload the logs etc. as well please.
 

psychedelicbeast

Active member
Yep, that's the file to upload. It will likely be in excess of 1GB so it needs to go to a cloud service with a link here. :)

Also check my second post here and upload the logs etc. as well please.
Here you go. I have added all the requisite files in the same zip.

 

psychedelicbeast

Active member
I just replicated the issue by actively putting the PC on sleep mode, and...ummm...waking it up. I get the same error on EventViewer:

1619531098920.png

The memory.dmp file hasn't updated i.e. I still see it as having last updated in the morning.
 

ubuysa

The BSOD Doctor
This dump is dated Tue Apr 27 10:48:42.192 2021 and the stop code is a DPC_WATCHDOG_VIOLATION, the exception code (0x1) indicates that the system ran at an elevated IRQL for too long. Basically that means that something stalled whilst running at an elevated IRQL - and usually that's a third party driver.

The stack trace clearly shows that the driver at fault is nvlddmkm.sys, the Nvidia graphics driver. This bug check at least then was caused by your graphics driver - or the graphics card itself of course.

However, in your system log you have several WHEA errors logged (WHEA is the Windows Hardware Error Architecture) with a machine check exception - and that indicates a hardware failure. In all cases the errors report a 'Cache Hierarchy Error' but for different processors. This is very similar to the BSODs I've been seeing with 4 x 3600MHz RAM cards in an AMD build.

There are other BSODs recorded in the system log, all for the same stop code as the one above.

In addition to all that, the system log contains a number of Kernel Boot errors related to either fast startup or resume from hiberation. Fast startup uses hibernation for the kernel but none of these seem to be related to an earlier system crash, they look to be related to normal startup. I would in any case disable the Windows Fast Startup feature, you don;t need it with those NVMe SSDs.

I'm now wondering whether your BSOD above (and the others probably) was not nvlddmkm.sys itself but was triggered by the hardware issue you appear to have.

I think you have to work with the hard information you have however, so we need to address the graphics card and driver. First ensure that the graphics card is properly seated, stuff does move in transit. Pop it out and re-seat it fully if you can. Do the same with your RAM cards too. I would also use DDU to uninstall the existing driver, then download the latest driver from the Nvidia website and manually install it (not through GeForce Experience). Be sure NOT to select any Nvidia audio components, we've had issues with those in the past. If it BSODs again be sure to copy the C:\Windows\Memory.dmp file to somewhere safe because each new BSOD overwrites the exiting dump file. It will be very useful to have a set of kernel dumps rather than just one.

I've just seen you latest post - definitely turn off Fast Startup. If you use hibernation try turning that off too to see whether that is giving problems.

BTW. I notice that you're using ESET Security. It would be wise to temporarily uninstall that and use Windows Defender for a time. Third party security systems often cause BSODs, I've debugged more than I can remember that were security software related (none for ESET yet though). There is no indiocation anywhere that ESET is a problem but these tools are a common source of problems.

LATER EDIT: Pop those M.2 drives out and reseat them too. We've had several strange issues caused by an M.2 drive that wasn't seated properly.
 
Last edited:

psychedelicbeast

Active member
This dump is dated Tue Apr 27 10:48:42.192 2021 and the stop code is a DPC_WATCHDOG_VIOLATION, the exception code (0x1) indicates that the system ran at an elevated IRQL for too long. Basically that means that something stalled whilst running at an elevated IRQL - and usually that's a third party driver.

The stack trace clearly shows that the driver at fault is nvlddmkm.sys, the Nvidia graphics driver. This bug check at least then was caused by your graphics driver - or the graphics card itself of course.

However, in your system log you have several WHEA errors logged (WHEA is the Windows Hardware Error Architecture) with a machine check exception - and that indicates a hardware failure. In all cases the errors report a 'Cache Hierarchy Error' but for different processors. This is very similar to the BSODs I've been seeing with 4 x 3600MHz RAM cards in an AMD build.

There are other BSODs recorded in the system log, all for the same stop code as the one above.

In addition to all that, the system log contains a number of Kernel Boot errors related to either fast startup or resume from hiberation. Fast startup uses hibernation for the kernel but none of these seem to be related to an earlier system crash, they look to be related to normal startup. I would in any case disable the Windows Fast Startup feature, you don;t need it with those NVMe SSDs.

I'm now wondering whether your BSOD above (and the others probably) was not nvlddmkm.sys itself but was triggered by the hardware issue you appear to have.

I think you have to work with the hard information you have however, so we need to address the graphics card and driver. First ensure that the graphics card is properly seated, stuff does move in transit. Pop it out and re-seat it fully if you can. Do the same with your RAM cards too. I would also use DDU to uninstall the existing driver, then download the latest driver from the Nvidia website and manually install it (not through GeForce Experience). Be sure NOT to select any Nvidia audio components, we've had issues with those in the past. If it BSODs again be sure to copy the C:\Windows\Memory.dmp file to somewhere safe because each new BSOD overwrites the exiting dump file. It will be very useful to have a set of kernel dumps rather than just one.

I've just seen you latest post - definitely turn off Fast Startup. If you use hibernation try turning that off too to see whether that is giving problems.

BTW. I notice that you're using ESET Security. It would be wise to temporarily uninstall that and use Windows Defender for a time. Third party security systems often cause BSODs, I've debugged more than I can remember that were security software related (none for ESET yet though). There is no indiocation anywhere that ESET is a problem but these tools are a common source of problems.

LATER EDIT: Pop those M.2 drives out and reseat them too. We've had several strange issues caused by an M.2 drive that wasn't seated properly.
Thanks. Have turned off Fast Startup, and will reconnect the GPU and SSD's in the evening. Given I have only 2 RAMs, do you expect the 4 RAM issue to be happening here?
 

psychedelicbeast

Active member
Have reconnected the RAM, and SSD. The GPU seems very tight and stable to be honest. Have reinstalled the GPU driver (not through Geforce Experience) after removing the previous driver through DDU. Removed EST. Also removed Fast restart and sleep. Let's see how it goes.
 
Top