Random Crash during Gaming - Happened consistently

ChrisCooney

Silver Level Poster
If there isn't one or if an update doesn't help I would pressure PCS to RMA it.

Edit: it fails on two RAM sticks, so swap the two sticks over - just in case.
Just got off the phone with PCS. They said that I am on quite an old BIOS version and that actual version is no longer listed on the Asus site... which is good fun. He recommended my next port of call be a BIOS update and that I should go to the latest available version. I am presently pulling things off of my USB stick so I've got some room! Will update when I've done that.
 

ubuysa

The BSOD Doctor
Just got off the phone with PCS. They said that I am on quite an old BIOS version and that actual version is no longer listed on the Asus site... which is good fun. He recommended my next port of call be a BIOS update and that I should go to the latest available version. I am presently pulling things off of my USB stick so I've got some room! Will update when I've done that.
Ah that sounds very encouraging. :)

Edit: I would advise buying a new USB stick to be honest.
 
Last edited:

ChrisCooney

Silver Level Poster
Well I wish I'd read this before I did it! But the update went fine and I now have the latest version of the BIOS. I've gone back to 4x8GB DIMMs at 3533MHz (worked perfectly in almost everything except Apex) and I'm going to play a little apex to see what happens.
 

ChrisCooney

Silver Level Poster
WELL - one hour in and no crash so far, which is a good bit longer than I was getting last night (20 mins tops). Gonna intermittently play games for the next few hours... you know... for research...

Get back to you with more updates!
 

ChrisCooney

Silver Level Poster
I was going to say - I may go back up to 3600MHz now and see if the crashing issues from a few months ago persist. If they don't then we know that the only thing that's changed between now and then is the BIOS update and we can be confident that this latest BIOS version (or one of the previous versions) has fixed it.
 

SpyderTracks

We love you Ukraine
I was going to say - I may go back up to 3600MHz now and see if the crashing issues from a few months ago persist. If they don't then we know that the only thing that's changed between now and then is the BIOS update and we can be confident that this latest BIOS version (or one of the previous versions) has fixed it.
id wait until it’s tested and settled as it is, once that’s proving stable, then up it again to 3600
 

ChrisCooney

Silver Level Poster
I personally think it’s game related.

There are a lot of reports if you Google it Related to the game
The fact it only happens on that game - some interplay between my Mobo, CPU, RAM, GPU + The game is quite possible. The fact that the game also was working fine and then stopped makes me wonder if they've recently patched and done something funky.

The thing that makes me think otherwise is the WHEA error in the event viewer + the L1 cache error in the dump. I don't see how a game with some dodgy code causes my CPU to panic. Even the venerable Ubuysa couldn't see a way through for that. Just confuses me a lot!
 

ChrisCooney

Silver Level Poster
After some messing about, I can definitely say it's crashing less frequently - I wonder if there have been incremental improvements to the whole RAM thing in later BIOS versions and I'm just experiencing some edge cases they haven't got to yet.
 

ubuysa

The BSOD Doctor
I personally think it’s game related.

There are a lot of reports if you Google it Related to the game
I'm downloading the latest dump. I'd agree about it being game related except that I know of no way that software on its own can cause a machine check - and this is definitely a machine check BSOD.

If the game was misbehaving we'd get BSODs for sure, but they'd be related to whatever way the game was fouling up (bad buffer allocation, wrong pointers etc.). I don't believe it's possible for software to screw up in any way that causes a machine check. That's a hardware issue.

I still think that the hardware has some sort of (possibly inherent) weakness that the game pushes over the edge - possibly in the way the game fouls up?

I would suggest that @ChrisCooney download Memtest (and makes a bootable USB stick with it) and then boots Memtest. Let it complete all four iterations of the 13 different tests and (if you can) when it's finished boot it immediately and run all four iterations again. That really will stress the RAM...

I'll report back on the latest dump....
 
Last edited:

ubuysa

The BSOD Doctor
So this latest dump is another machine check error, the error reason is another processor level 1 cache read error (BUSL1_SRC_IRD_I_NOTIMEOUT_ERR (Proc 10 Bank 1)).

The process in control was the game again (r5apex.exe) and once again the machine check occurred as soon as the kernel was entered.

One additional thing worth mentioning from the dump is that the trap frame, which captures all the CPU registers at the time of the error, has the value 0x00007ff739e8cec5 in the RIP register. This is the instruction pointer which points at the failing instruction, yet the contents of memory location 0x00007ff739e8cec5 are invalid in the dump (they show up as only ??). This of course is the return address follwing the system service call and I suppose it's possible that we could see a L1 cache error if the return address pointed to invalid memory. The problem is that a kernel dump doesn't contain user mode code so I can't verify whether the location 0x00007ff739e8cec5 really is invalid or it just appears invalid because it's not included in the dump. The only way to know would be to take a complete memory dump - but that would be huge and would take a month of Sundays to upload!

I'm softening my stance a bit now, and concede that this may be the game. If the game is somehow screwing with the instruction pointer then the L1 cache could become polluted with garbage. I think we have to trust that the trap frame is genuinely telling us that the return address is invalid. The problem with this theory is that we'd be seeing it everywhere, the Internet would be flooded with people having BSODs in this game - and yet clearly it's not affecting everyone playing this game.

I really would like to see a couple of Memtest runs on this hardware if you can?
 

ChrisCooney

Silver Level Poster
So this latest dump is another machine check error, the error reason is another processor level 1 cache read error (BUSL1_SRC_IRD_I_NOTIMEOUT_ERR (Proc 10 Bank 1)).

The process in control was the game again (r5apex.exe) and once again the machine check occurred as soon as the kernel was entered.

One additional thing worth mentioning from the dump is that the trap frame, which captures all the CPU registers at the time of the error, has the value 0x00007ff739e8cec5 in the RIP register. This is the instruction pointer which points at the failing instruction, yet the contents of memory location 0x00007ff739e8cec5 are invalid in the dump (they show up as only ??). This of course is the return address follwing the system service call and I suppose it's possible that we could see a L1 cache error if the return address pointed to invalid memory. The problem is that a kernel dump doesn't contain user mode code so I can't verify whether the location 0x00007ff739e8cec5 really is invalid or it just appears invalid because it's not included in the dump. The only way to know would be to take a complete memory dump - but that would be huge and would take a month of Sundays to upload!

I'm softening my stance a bit now, and concede that this may be the game. If the game is somehow screwing with the instruction pointer then the L1 cache could become polluted with garbage. I think we have to trust that the trap frame is genuinely telling us that the return address is invalid. The problem with this theory is that we'd be seeing it everywhere, the Internet would be flooded with people having BSODs in this game - and yet clearly it's not affecting everyone playing this game.

I really would like to see a couple of Memtest runs on this hardware if you can?
Your wish is my command. I’ll get right on that.

I ran Prime95 for a few hours yesterday to see, and that reported no errors. Would imagine that’s quite a bit less rigorous though.

Chris
 

ChrisCooney

Silver Level Poster
Unfortunately, this is back and I suspect that @ubuysa was right all along - there's something hardware related going on here. Multiple games from multiple vendors, happens intermittently.

The new behaviour now is that it no longer writes out a memory.dmp file, so I can't show much there unfortunately, which is painful.

I'm going to run Memtest overnight tonight and see what that turns up, but it's looking like it might have to be an RMA :( I know that the PC Specialist folks will want me to run memtest anyway to be sure, so I'll do that and report back here.

My understanding of replacing a processor is that you just unscrew the cooler, clean the paste off, lift the lever and replace the chip, then reapply thermal paste and reapply the cooler. Is that right? If it's that simple, I may ask that they send me out a replacement, rather than sending the whole machine back.

EDIT for Clarity: The old solution that used to work was to run at 16GB RAM (2x8GB) so I've been doing that to see if it will work but it's made no difference.

Chris
 

SpyderTracks

We love you Ukraine
My understanding of replacing a processor is that you just unscrew the cooler, clean the paste off, lift the lever and replace the chip, then reapply thermal paste and reapply the cooler. Is that right? If it's that simple, I may ask that they send me out a replacement, rather than sending the whole machine back.
That's exactly right. The hardest bit is getting the cooler back on and off.

What speeds were you running the RAM at?
 

SpyderTracks

We love you Ukraine
The ram is set at 3533, down from 3600, as originally that removed the crashing issue.

I've also ran it at 3200 in the past to no avail :(
Yeah, do a memtest as you say, but I think you're right that it's likely a hardware issue given the testing you've done. We've had this before and there was some kind of issue with the CPU itself.
 

ChrisCooney

Silver Level Poster
Yeah, do a memtest as you say, but I think you're right that it's likely a hardware issue given the testing you've done. We've had this before and there was some kind of issue with the CPU itself.
Yeah, when you google for this issue it's all AMD Ryzen 9 5900x CPUs, so it's not a coincidence. It's a bit of a pain, but it's not the end of the world. The PC is still pretty functional for the time being anyways. Been a bit of a saga though!

Chris
 

carlos726811

Bright Spark
I just been reading and noticed you getting whea error message. Just like to say I was having the exact same issue when playing destiny 2. I was on phone to pcs and we did some tests
1. Cpu stress test comp never crashed.
2. Memory test. I went from 4x 32gb to 8gb. Never crashed. I sent my gpu back thinking it was that as I installed 1050ti was having no issues. Replaced gpu was still having issues. So rang pcs back and they did some test on my nvme drive. They checked speed for my main drive and I wasn't getting the speed I should. They think it might be my main nvme drive. So what I did was reinstalled windows on to my 2nd nvme drive and wiped the main drive clean and sent back. Since I installed windows on 2nd drive. I have had no issues or error code. Touch wood. Waiting for my main nvme drive to arrive back and try again
 
Top