Screen pixelated and pc reset

Zel

Member
Had a problem where my screen had large pixels appearing on screen before the top half turned black. The pc seemed to freeze as I couldn't open the task manager or see the mouse cursor. At this point the pc rebooted itself.

Also at one point previously while watching Youtube, the video sound was distorted for a few seconds before returning to normal after I reloaded the page.

PC Specs
Case
CORSAIR 4000D AIRFLOW TEMPERED GLASS GAMING CASE
Processor (CPU)
AMD Ryzen 7 7700X Eight Core CPU (4.5GHz-5.4GHz/40MB CACHE/AM5)
Motherboard
ASUS® TUF GAMING B650-PLUS WIFI (DDR5, USB 3.2, 6Gb/s)
Memory (RAM)
32GB Corsair VENGEANCE DDR5 5600MHz (2 x 16GB)
Graphics Card
16GB AMD RADEON™ RX 6950 XT - HDMI, DP - DX® 12
1st M.2 SSD Drive
500GB SAMSUNG 970 EVO PLUS M.2, PCIe NVMe (up to 3500MB/R, 3200MB/W)
1st M.2 SSD Drive
1TB INTEL® 670p M.2 NVMe PCIe SSD (up to 3500MB/sR | 2500MB/sW)
Power Supply
CORSAIR 1000W RMx SERIES™ - MODULAR 80 PLUS GOLD, ULTRA QUIET
Power Cable
1 x 1.5 Metre UK Power Cable (Kettle Lead, 1.0mm Core)
Processor Cooling
PCS FrostFlow 150 Series High Performance CPU Cooler
Thermal Paste
STANDARD THERMAL PASTE FOR SUFFICIENT COOLING
Sound Card
ONBOARD 6 CHANNEL (5.1) HIGH DEF AUDIO (AS STANDARD)
Network Card
ONBOARD 2.5Gbe LAN PORT
USB/Thunderbolt Options
MIN. 2 x USB 3.0 & 2 x USB 2.0 PORTS @ BACK PANEL + MIN. 2 FRONT PORTS
Operating System
Windows 11 Home 64 Bit - inc. Single Licence [KUK-00003]
Warranty
3 Year Silver Warranty (1 Year Collect & Return, 1 Year Parts, 3 Year Labour)
 

Scott

Behold The Ford Mondeo
Moderator
I would have a little look at your temps as well., run HWMonitor and go for a gaming spree then post up the temps via screenshot. I'm not convinced that cooler is going to be enough when pushing the 7700X.
 

SpyderTracks

We love you Ukraine
I would have a little look at your temps as well., run HWMonitor and go for a gaming spree then post up the temps via screenshot. I'm not convinced that cooler is going to be enough when pushing the 7700X.
Yeah, I was in two minds about that
 

ubuysa

The BSOD Doctor
Were there no minidumps in the folder C:\Windows\Minidump?

I don't see anything in your logs that might account for the issue you describe. There are some event id 11 errors for a Solidigm stroage drive controller...
Code:
Log Name:      System
Source:        solidnvm
Date:          14/09/2023 22:32:23
Event ID:      11
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      Andrew
Description:
The description for Event ID 11 from source solidnvm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event: 

\Device\RaidPort2
Command set: 0x1
Opcode: 0x81
SCT: 0x0
SC: 0x2
CSTS: 0x1

The message resource is present but the message was not found in the message table
This puzzled me for a while because you don't have a Solidigm NVMe in your spec. The system info does show it...
Code:
Name    Solidigm NVMe Storage Controller    
Manufacturer    Solidigm Technology    
Status    OK    
PNP Device ID    PCI\VEN_8086&DEV_F1AA&SUBSYS_390F8086&REV_03\6&9197FF2&0&00000011    
Memory Address    0xFC600000-0xFC603FFF    
IRQ Channel    IRQ 4294967255    
IRQ Channel    IRQ 4294967254    
IRQ Channel    IRQ 4294967253    
IRQ Channel    IRQ 4294967252    
IRQ Channel    IRQ 4294967251    
IRQ Channel    IRQ 4294967250    
IRQ Channel    IRQ 4294967249    
IRQ Channel    IRQ 4294967248    
IRQ Channel    IRQ 4294967247    
IRQ Channel    IRQ 4294967246    
IRQ Channel    IRQ 4294967245    
IRQ Channel    IRQ 4294967244    
IRQ Channel    IRQ 4294967243    
IRQ Channel    IRQ 4294967242    
IRQ Channel    IRQ 4294967241    
IRQ Channel    IRQ 4294967240    
Driver    C:\WINDOWS\SYSTEM32\DRIVERS\SOLIDNVM.SYS (2.2.0.1017, 358.20 KB (366,792 bytes), 23/06/2023 06:15)
I looked up the VEN & DEV identifiers there and they relate to your Intel 670P NVMe drive. I have no idea whether that's your system drive? If it is, then I would be a little concerned about these errors - which occur regularly. If it's just a data drive then it's probably not an issue. I would however suggest that you download the Intel Memory and Storage Tool and use that to run a diagnostics on that drive.

From what you describe I'm not at all sure that this is a software or driver issue, so I'll pass you on to the hardware experts on here.
 

Zel

Member
Couldn't find any minidumps.

Ran diagnostics on the Intel drive (checked and it's the data drive not the system one). Updated the firmware and ran diagnostics. Had to use this one, as the one you linked wouldn't let me run diagnostics, but it didn't detect any issues.
 

ubuysa

The BSOD Doctor
Couldn't find any minidumps.

Ran diagnostics on the Intel drive (checked and it's the data drive not the system one). Updated the firmware and ran diagnostics. Had to use this one, as the one you linked wouldn't let me run diagnostics, but it didn't detect any issues.
Ok, it's obviously a Solidigm drive badged as Intel then. Those error 11 messages aren't terribly serious anyway. No dumps makes me more certain that this is hardware, and I'd suspect the graphics card first.
 

Zel

Member
Found these dmp files. Don't know if they'll help.

Problem happened three times again today while online. Twice without the pixels appearing, rebooting itself all three times. But the third time after rebooting the screen was blank and I had to turn the pc off and back on again myself. The problem hasn't happened while gaming so far.

Used HWMonitor while online with nothing else running. Screenshot attached.
 

Attachments

  • hwmonitor01.png
    hwmonitor01.png
    52.5 KB · Views: 72
  • hwmonitor02.png
    hwmonitor02.png
    46.7 KB · Views: 75
  • hwmonitor03.png
    hwmonitor03.png
    45.4 KB · Views: 81
  • hwmonitor04.png
    hwmonitor04.png
    9.3 KB · Views: 75
Last edited:

SpyderTracks

We love you Ukraine
Found these dmp files. Don't know if they'll help.

Problem happened three times again today while online. Twice without the pixels appearing, rebooting itself all three times. But the third time after rebooting the screen was blank and I had to turn the pc off and back on again myself. The problem hasn't happened while gaming so far.

Used HWMonitor while online with nothing else running. Screenshot attached.
There’s no load during these readings that I can see, need to take screenshots while the game is running so we can see thermals under a game load, or you can apply a stress load with something like Prime95 for cpu and furmark for gpu.
 

ubuysa

The BSOD Doctor
The three dumps you uploaded are live kernel dumps. These are dumps written when a problem occurs but Windows is able to recover. You found them in the C:\Windows\LiveKernelReports folder - which was a smart thing to do. (y)

All three dumps are identical, they are all 0x141 - VIDEO_ENGINE_TIMEOUT_DETECTED bughecks and the dump triage analysis blames your graphics driver {amdkmdag.sys)...
Code:
FAILURE_BUCKET_ID:  LKD_0x141_IMAGE_amdkmdag.sys
However, the 0x141 happens because the graphics card did not respond in a reasonable time to a (Windows) driver command, this could of course be a graphics driver fault or a graphics card fault. In the call stack you can see what's happening...
Rich (BB code):
11: kd> knL
 # Child-SP          RetAddr               Call Site
00 ffff9484`bb9aee50 fffff805`a0e38f24     watchdog!WdpDbgCaptureTriageDump+0xb7   <=== Live kernel dump written
01 ffff9484`bb9aeec0 fffff805`a0c9da9d     watchdog!WdDbgReportRecreate+0xd4
02 ffff9484`bb9aef20 fffff805`a0c9c500     dxgkrnl!TdrUpdateDbgReport+0x11d
03 ffff9484`bb9aef80 fffff805`d3d8bd6f     dxgkrnl!TdrCollectDbgInfoStage1+0x300   <=== dump data collection starts
04 ffff9484`bb9af0c0 fffff805`d3e56899     dxgmms2!VidSchiResetEngine+0x303
05 ffff9484`bb9af270 fffff805`d3e266ef     dxgmms2!VidSchiResetEngines+0xb1   <=== FAILURE - graphics engines reset (this is the fault)
06 ffff9484`bb9af2c0 fffff805`d3df8d9f     dxgmms2!VidSchiCheckHwProgress+0x2d90f   <=== checking progress of graphics command
07 ffff9484`bb9af340 fffff805`d3d5b4e9     dxgmms2!VidSchiWaitForSchedulerEvents+0x37f   <=== wait for graphics card to respond
08 ffff9484`bb9af410 fffff805`d3e05405     dxgmms2!VidSchiScheduleCommandToRun+0x309    <=== graphics card command
09 ffff9484`bb9af4e0 fffff805`d3e0537a     dxgmms2!VidSchiRun_PriorityTable+0x35
0a ffff9484`bb9af530 fffff805`6be12667     dxgmms2!VidSchiWorkerThread+0xca   <=== start new graphics operation
0b ffff9484`bb9af570 fffff805`6c0370a4     nt!PspSystemThreadStartup+0x57
0c ffff9484`bb9af5c0 00000000`00000000     nt!KiStartSystemThread+0x34
You read this from the bottom up and I've added notes to show you what's happening. You can see both the dxgmms2.sys and dxgkrnl.sys drivers being called, these are Microsoft DirectX drivers (so they're not at fault). Although you can't see it in this stack, the amdkmdag.sys driver will be called (by the dxgmms2!VidSchiScheduleCommandToRun function call) to get the graphics instructions executed by the graphics card. Then we schedule a wait for the graphics card to respond to the command - and the graphics card (or amdkmdag.sys) never responds. The timeout detection and recovery function (TDR) then resets both the graphics driver and the graphics card - which will crash whatever app was using it at the time - and we then see dump data collection starting and a live kernel dump written. Note that this wasn't a BSOD, the system remained active, but there will have been a crash to the desktop of whatever app was running on the card. Since in post #1 you talk about large pixels and a half black screen I would put money on the graphics card being the problem.

You should stress test the graphics card.
 
Last edited:

Zel

Member
Live Kernel Dump

Tried running HWmonitor alongside prime95 and furmark, only for the problem to happen again with the pixels and pc resetting. When the pc rebooted an AMD bug report appeared saying there was a display driver issue and that it was now operating in safe mode.

The screen went black again (no pixels) with the fan going loudly. Turned pc off and on again only for the screen to still be black showing nothing. Reset pc again and the screen is working normally for now.
 

SpyderTracks

We love you Ukraine
Live Kernel Dump

Tried running HWmonitor alongside prime95 and furmark, only for the problem to happen again with the pixels and pc resetting. When the pc rebooted an AMD bug report appeared saying there was a display driver issue and that it was now operating in safe mode.

The screen went black again (no pixels) with the fan going loudly. Turned pc off and on again only for the screen to still be black showing nothing. Reset pc again and the screen is working normally for now.
Hmm, that does sound like a hardware issue, I'd be looking at an RMA for the GPU.
 

Scott

Behold The Ford Mondeo
Moderator
It's a shame if it is the hardware but these issues do tend to present quite early so it's good to get the system perfect so that you have the confidence in it going foward.

It can be the drivers, you would be amazed at the trouble they can case, but the fact that it actually does a hard reset suggests it's likely the hardware. Motherboard faults and CPU faults tend to hard crash the PC, drivers, software and RAM tend to blue screen, whereas GPU driver faults tend to CTD. It's not a hard and fast rule, it's just what happens more often than not.
 

ubuysa

The BSOD Doctor
Hmmm, yes. But I'd never advise anyone to try and diagnose a problem based on how it presents (fails).

A CTD, for example, could be TDR recovering from a graphics hang, where we'd get a live kernel dump, or it could just be an failure in user-mode code, where we might get clues in the logs.

Bad RAM can cause all kinds of failures, it depends on what type of code is in the bad page(s). I've seen complete crashes, BSODs, freezes, and CTDs caused by bad RAM.

Bad drivers do almost always BSOD, but so do many RAM problems and some CPU problems too (0x101 bugchecks for example).

Troubleshooting is like detective work, you must be careful not to form an opinion up front, because then you only look for evidence that supports your hypothesis. Instead, you have to keep an open mind and collect all the available evidence before attempting to make a diagnosis. Even then we often end up with more than one potential culprit. IMO of course.
 
Top