Random Crash during Gaming - Happened consistently

ubuysa

The BSOD Doctor
I don't know why I didn't see this thread earlier, apologies. :oops:

I've looked at both dumps and @SpyderTracks was spot on in his very first post. This issue IS the "4 x 8GB RAM sticks in an AMD build" timing error and is clear in both your dumps. You have a hardware issue here.

In a WHEA dump a hardware error record is created that contains details of the error. In the first dump (when the process in control was the System process) the error is reported as...
Code:
Error         : BUSL1_SRC_IRD_I_NOTIMEOUT_ERR (Proc 9 Bank 1)

....and in the second dump (when the process in control was r5apex.exe) the error is reported as...
Code:
Error         : DCACHEL1_DRD_ERR (Proc 8 Bank 0)

Both are level 1 processor cache errors. The first indicates that there was an error in loading the level 1 cache (IRD is an instruction read/prefetch) and the second indicates that there was an error in reading the level 1 cache (DRD is a deferred read). These are exactly the hardware erors we've been seeing with this 4 RAM sticks issue.

I don't know what the solution is. I do know that the incidence of these BSODs has decreased markedly over recent weeks which makes me think that this may be an AGESA/BIOS issue that has since been resolved. I do believe that it's some sort of timing error. I also know that some people have had a CPU swapped by PCS and that seems to have solved the issue for them.

I would suggest that you phone PCS, point them to this thread and make sure they have the links to your two kernel dumps. See what they say about this.

In the meantime you might try either (or both) removing two RAM sticks (2 x 8GB setups suffered much less from this issue) and/or downclock the RAM to 3200MHz (people with 3200MHz RAM suffered much less from this issue too).
 

ChrisCooney

Silver Level Poster
I don't know why I didn't see this thread earlier, apologies. :oops:

I've looked at both dumps and @SpyderTracks was spot on in his very first post. This issue IS the "4 x 8GB RAM sticks in an AMD build" timing error and is clear in both your dumps. You have a hardware issue here.

In a WHEA dump a hardware error record is created that contains details of the error. In the first dump (when the process in control was the System process) the error is reported as...
Code:
Error         : BUSL1_SRC_IRD_I_NOTIMEOUT_ERR (Proc 9 Bank 1)

....and in the second dump (when the process in control was r5apex.exe) the error is reported as...
Code:
Error         : DCACHEL1_DRD_ERR (Proc 8 Bank 0)

Both are level 1 processor cache errors. The first indicates that there was an error in loading the level 1 cache (IRD is an instruction read/prefetch) and the second indicates that there was an error in reading the level 1 cache (DRD is a deferred read). These are exactly the hardware erors we've been seeing with this 4 RAM sticks issue.

I don't know what the solution is. I do know that the incidence of these BSODs has decreased markedly over recent weeks which makes me think that this may be an AGESA/BIOS issue that has since been resolved. I do believe that it's some sort of timing error. I also know that some people have had a CPU swapped by PCS and that seems to have solved the issue for them.

I would suggest that you phone PCS, point them to this thread and make sure they have the links to your two kernel dumps. See what they say about this.

In the meantime you might try either (or both) removing two RAM sticks (2 x 8GB setups suffered much less from this issue) and/or downclock the RAM to 3200MHz (people with 3200MHz RAM suffered much less from this issue too).
Really appreciate your sagely wisdom Ubuysa.

In the past with the 4x8GB Sticks error, downclocking from 3600 -> 3533 completely removed the crashing issue. The crashing issue was occurring when the computer was idle. Secondly, the crashing is _only_ happening when I play Apex Legends, specifically. Does this line up with the experiences of other users?

I appreciate the dump is pretty conclusive. I'm just very confused as to why it's only happening in this game when I've played more demanding games without a crash.
 

ubuysa

The BSOD Doctor
Really appreciate your sagely wisdom Ubuysa.

In the past with the 4x8GB Sticks error, downclocking from 3600 -> 3533 completely removed the crashing issue. The crashing issue was occurring when the computer was idle. Secondly, the crashing is _only_ happening when I play Apex Legends, specifically. Does this line up with the experiences of other users?

I appreciate the dump is pretty conclusive. I'm just very confused as to why it's only happening in this game when I've played more demanding games without a crash.
In the first dump the process in control was the System process, not your game. I have no idea why that game should trigger the issue in particular but the problem is almost certainly with the hardware, not with the game. That downlocking your RAM stops the crashes is proof of that.

Downclocking to 3533MHz (or lower) has been a common workaround for others and if that works for you I would stick with it. The alternative is a pretty long wait for an RMA....
 

ChrisCooney

Silver Level Poster
In the first dump the process in control was the System process, not your game. I have no idea why that game should trigger the issue in particular but the problem is almost certainly with the hardware, not with the game. That downlocking your RAM stops the crashes is proof of that.

Downclocking to 3533MHz (or lower) has been a common workaround for others and if that works for you I would stick with it. The alternative is a pretty long wait for an RMA....

Really strange! I wonder if the game puts the RAM under a certain amount of stress or something....

Just to be clear, my RAM is currently downclocked to 3533 which previously solved the issue (see attached screenshot)
cap1.PNG


Nothing has changed in that respect, but I wonder if going down to 3200 will do anything. I'll downclock further and see how I get on!

Thanks for the pointers Ubuysa.
 

ubuysa

The BSOD Doctor
Really strange! I wonder if the game puts the RAM under a certain amount of stress or something....

Just to be clear, my RAM is currently downclocked to 3533 which previously solved the issue (see attached screenshot)View attachment 27451

Nothing has changed in that respect, but I wonder if going down to 3200 will do anything. I'll downclock further and see how I get on!

Thanks for the pointers Ubuysa.
Ah so you're getting these crashes at 5333MHz then? Try running with just two RAM sticks for a while then and/or downclocking further. One or two people had this issue with 3200MHz RAM though....

My gut feel has always been that this is some sort of timing issue. I don't think it's anything actually faulty, it's just some sort of timing glitch under some peculiar set of circumstances.

Ask PCS whether there is a BIOS update for your build and whether they think it might help with this issue. AMD ship their AGESA microcode in the BIOS, so an AGESA upgrade would be reflected in an new BIOS version.
 

ChrisCooney

Silver Level Poster
I've downclocked to 3200 MHz and left my PC running while I work on my mac - it's sat on the Apex holding screen which was enough to trigger a crash before. I'll leave it hanging there for a while and see if anything comes up. Will consider an hour without a crash to be an improvement -> 5 hours to be suitably solved.

I think, because I'm downclocking RAM to a lower spec than what I've paid for, and by a bit of a jump now, I'll be speaking with PCS anyway. I guess their phone is the best way to get in touch, right?
 

SpyderTracks

We love you Ukraine
I've downclocked to 3200 MHz and left my PC running while I work on my mac - it's sat on the Apex holding screen which was enough to trigger a crash before. I'll leave it hanging there for a while and see if anything comes up. Will consider an hour without a crash to be an improvement -> 5 hours to be suitably solved.

I think, because I'm downclocking RAM to a lower spec than what I've paid for, and by a bit of a jump now, I'll be speaking with PCS anyway. I guess their phone is the best way to get in touch, right?
Yep, phone would be best. I’d enquire about a BIOS update.
 

ChrisCooney

Silver Level Poster
Update: 3200MHz and the error persists. Again, only when I boot Apex. Really bizarre!

I'll go down to two DIMMs and report back. Good news is it seems to be pretty easy to trigger the crash which means we've got a nice repeatable test case.
 

ubuysa

The BSOD Doctor
Update: 3200MHz and the error persists. Again, only when I boot Apex. Really bizarre!

I'll go down to two DIMMs and report back. Good news is it seems to be pretty easy to trigger the crash which means we've got a nice repeatable test case.
Could you upload the dumps produced? It'll be interesting to see whether I can see anything different.
 

ubuysa

The BSOD Doctor
Ran with 2x8GB and it ran into the same issue this time. For clarity, 2x8GB downclocked to 3200MHz.

Once again, crash ONLY occurs when I jump into an Apex legends game. Other games (Mount and Blade 2, Call of Duty Warzone, Cyberpunk 2077, Chivalry 2) do not cause a crash.

Dump link: https://drive.google.com/file/d/1zYElr4Dt_3q1uydLZBRRwvVejn4pLJqP/view?usp=sharing

@ubuysa - Dump for you here mate!
The latest dump is a mirror image (pretty much) of the earlier one (where r5apex.exe was the process in control). The process in control in this one is also r5apex.exe and the WHEA error record identifies DCACHEL1_DRD_ERR (Proc 6 Bank 0) as the error.

The stack trace shows that the machine check occurs immediately the kernel is entered from user mode code at address 0x00007ff6`94df9d94.....
Code:
Child-SP          RetAddr           : Args to Child                                                           : Call Site
ffffda81`089db938 fffff807`6deb440a : 00000000`00000124 00000000`00000000 ffff9888`8c97e028 00000000`bc000800 : nt!KeBugCheckEx
ffffda81`089db940 fffff807`6cce15b0 : 00000000`00000000 ffff9888`8c97e028 ffff9888`87ed3960 ffff9888`8c97e028 : nt!HalBugCheckSystem+0xca
ffffda81`089db980 fffff807`6dfb61ee : 00000000`00000000 ffffda81`089dba29 ffff9888`8c97e028 ffff9888`87ed3960 : PSHED!PshedBugCheckSystem+0x10
ffffda81`089db9b0 fffff807`6deb5d31 : ffff9888`932ea900 ffff9888`932ea900 ffff9888`87ed39b0 ffff9888`87ed3960 : nt!WheaReportHwError+0x46e
ffffda81`089dba90 fffff807`6deb60a3 : 00000000`00000006 ffff9888`87ed39b0 ffff9888`87ed3960 00000000`00000006 : nt!HalpMcaReportError+0xb1
ffffda81`089dbc00 fffff807`6deb5f80 : ffff9888`87cfa450 0001ba88`00000000 ffffda81`089dbe00 fb810001`ba7a860f : nt!HalpMceHandlerCore+0xef
ffffda81`089dbc50 fffff807`6deb54c5 : ffff9888`87cfa450 ffffda81`089dbef0 00000000`00000000 48002c22`31058b48 : nt!HalpMceHandler+0xe0
ffffda81`089dbc90 fffff807`6deb7c85 : ffff9888`87cfa450 8bff9884`41e80000 0366e8d7`4d8d48d8 85486f4d`8b48ffbf : nt!HalpHandleMachineCheck+0xe9
ffffda81`089dbcc0 fffff807`6df0d519 : 8548ff98`6dfae805 cf8b48d2`330a74ff 48c38bff`f46c3be8 5e5f0000`00c8c481 : nt!HalHandleMcheck+0x35
ffffda81`089dbcf0 fffff807`6de05cfa : 89481824`74894808 57415641`5520247c 3360ec83`48ec8b48 f0458948`c0570fc0 : nt!KiHandleMcheck+0x9
ffffda81`089dbd20 fffff807`6de059b7 : 00000000`00000000 00000000`00000000 00000000`00000003 00000000`00000000 : nt!KxMcheckAbort+0x7a
ffffda81`089dbe60 00007ff6`94df9d94 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiMcheckAbort+0x277 (TrapFrame @ ffffda81`089dbe70)
000000d9`1f7ef7f0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007ff6`94df9d94
You read these from the bottom up, you can see address 0x00007ff6`94df9d94 at the bottom, this is user mode code (the higher address bits are 0x0000 - kernel code has 0xFFFF as its higher address bits) which I've established is from r5apex.exe, then we see immediately next the nt!KiMcheckAbort which is the hardware error being detected. The calls that follow are the WHEA documentation of the error for the eventual dump and fnally at the top we see the bug check that causes the BSOD.

A kernel dump doesn't contain user mode code so we can't look at what this is directly, but by looking down the thread control blocks for all the threads in the r5apex.exe process we can clearly see that the address 0x00007ff6`94df9d94 is an r5apex.exe thread.

We thus know that the kernel takes a machine check as soon as a specific r5apex.exe thread calls it. I've never heard of a miss-formed service call to cause a hardware machine check error before, but I suppose anything is possible. Have you checked the Apex forums for similar issues?

There are a couple of other things I notice in your dumps....

One is that you're at Windows version 10.0.19041.1, which (I think) is 20H1, and that's over a year old now. I don't like upgrades in place but I do wonder whether there has been some support in either 20H2 or 21H1 that solve whatever issue is causing these machine checks? I rather think you'd be wise to upgrade to 21H1 (10.0.19043.1055) to be sure, and I would recommend doing that via a clean install and not as an upgrade-in-place.

The other thing is that you're running EasyAntiCheat. I can see the easyanticheat.sys driver loaded. I've seen a couple of BSODs in the past that were caused by this driver - though not WHEA BSODs. I wonder whether it would be worth uninstalling this (or not installing it if you go to 21H1) just to eliminate it as a possible cause. Again though it's difficult to see how software can cause a machine check....

You might try checking the component store with dism /online /cleanup-image /checkhealth and the system files with sfc /scannow too.

This still feels like a hardware issue to me.
 
Last edited:

ChrisCooney

Silver Level Poster
The latest dump is a mirror image (pretty much) of the earlier one (where r5apex.exe was the process in control). The process in control in this one is also r5apex.exe and the WHEA error record identifies DCACHEL1_DRD_ERR (Proc 6 Bank 0) as the error.

The stack trace shows that the machine check occurs immediately the kernel is entered from user mode code at address 0x00007ff6`94df9d94.....
Code:
Child-SP          RetAddr           : Args to Child                                                           : Call Site
ffffda81`089db938 fffff807`6deb440a : 00000000`00000124 00000000`00000000 ffff9888`8c97e028 00000000`bc000800 : nt!KeBugCheckEx
ffffda81`089db940 fffff807`6cce15b0 : 00000000`00000000 ffff9888`8c97e028 ffff9888`87ed3960 ffff9888`8c97e028 : nt!HalBugCheckSystem+0xca
ffffda81`089db980 fffff807`6dfb61ee : 00000000`00000000 ffffda81`089dba29 ffff9888`8c97e028 ffff9888`87ed3960 : PSHED!PshedBugCheckSystem+0x10
ffffda81`089db9b0 fffff807`6deb5d31 : ffff9888`932ea900 ffff9888`932ea900 ffff9888`87ed39b0 ffff9888`87ed3960 : nt!WheaReportHwError+0x46e
ffffda81`089dba90 fffff807`6deb60a3 : 00000000`00000006 ffff9888`87ed39b0 ffff9888`87ed3960 00000000`00000006 : nt!HalpMcaReportError+0xb1
ffffda81`089dbc00 fffff807`6deb5f80 : ffff9888`87cfa450 0001ba88`00000000 ffffda81`089dbe00 fb810001`ba7a860f : nt!HalpMceHandlerCore+0xef
ffffda81`089dbc50 fffff807`6deb54c5 : ffff9888`87cfa450 ffffda81`089dbef0 00000000`00000000 48002c22`31058b48 : nt!HalpMceHandler+0xe0
ffffda81`089dbc90 fffff807`6deb7c85 : ffff9888`87cfa450 8bff9884`41e80000 0366e8d7`4d8d48d8 85486f4d`8b48ffbf : nt!HalpHandleMachineCheck+0xe9
ffffda81`089dbcc0 fffff807`6df0d519 : 8548ff98`6dfae805 cf8b48d2`330a74ff 48c38bff`f46c3be8 5e5f0000`00c8c481 : nt!HalHandleMcheck+0x35
ffffda81`089dbcf0 fffff807`6de05cfa : 89481824`74894808 57415641`5520247c 3360ec83`48ec8b48 f0458948`c0570fc0 : nt!KiHandleMcheck+0x9
ffffda81`089dbd20 fffff807`6de059b7 : 00000000`00000000 00000000`00000000 00000000`00000003 00000000`00000000 : nt!KxMcheckAbort+0x7a
ffffda81`089dbe60 00007ff6`94df9d94 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiMcheckAbort+0x277 (TrapFrame @ ffffda81`089dbe70)
000000d9`1f7ef7f0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007ff6`94df9d94
You read these from the bottom up, you can see address 0x00007ff6`94df9d94 at the bottom, this is user mode code (the higher address bits are 0x0000 - kernel code has 0xFFFF as its higher address bits) which I've established is from r5apex.exe, then we see immediately next the nt!KiMcheckAbort which is the hardware error being detected. The calls that follow are the WHEA documentation of the error for the eventual dump and fnally at the top we see the bug check that causes the BSOD.

A kernel dump doesn't contain user mode code so we can't look at what this is directly, but by looking down the thread control blocks for all the threads in the r5apex.exe process we can clearly see that the address 0x00007ff6`94df9d94 is an r5apex.exe thread.

We thus know that the kernel takes a machine check as soon as a specific r5apex.exe thread calls it. I've never heard of a miss-formed service call to cause a hardware machine check error before, but I suppose anything is possible. Have you checked the Apex forums for similar issues?

There are a couple of other things I notice in your dumps....

One is that you're at Windows version 10.0.19041.1, which (I think) is 20H1, and that's over a year old now. I don't like upgrades in place but I do wonder whether there has been some support in either 20H2 or 21H1 that solve whatever issue is causing these machine checks? I rather think you'd be wise to upgrade to 21H1 (10.0.19043.1055) to be sure, and I would recommend doing that via a clean install and not as an upgrade-in-place.

The other thing is that you're running EasyAntiCheat. I can see the easyanticheat.sys driver loaded. I've seen a couple of BSODs in the past that were caused by this driver - though not WHEA BSODs. I wonder whether it would be worth uninstalling this (or not installing it if you go to 21H1) just to eliminate it as a possible cause. Again though it's difficult to see how software can cause a machine check....

You might try checking the component store with dism /online /cleanup-image /checkhealth and the system files with sfc /scannow too.

This still feels like a hardware issue to me.
I'm happy to start working through the upgrade process.

I'm just a little confused about how a hardware failure can only occur when only running one specific game, you know? I can't find another game that will cause the PC to completely fail like this. There are games that crash to desktop (Mordhau) but that's obviously not the same thing.
 

ChrisCooney

Silver Level Poster
I'm happy to start working through the upgrade process.

I'm just a little confused about how a hardware failure can only occur when only running one specific game, you know? I can't find another game that will cause the PC to completely fail like this. There are games that crash to desktop (Mordhau) but that's obviously not the same thing.
AND while i'm on this, for that game to previously work just fine and only in the past few days begin crashing like this. It's so very specific!
 

ubuysa

The BSOD Doctor
I'm happy to start working through the upgrade process.

I'm just a little confused about how a hardware failure can only occur when only running one specific game, you know? I can't find another game that will cause the PC to completely fail like this. There are games that crash to desktop (Mordhau) but that's obviously not the same thing.
That's why I'd suggest looking for an Apex forum to ask.

Generally, if a badly formed system service call is made we'd get a more common BSOD, related to the nature of the invalid request.

A WHEA BSOD happens because the hardware failed somehow. In the three dumps of yours that's been a level 1 cache failure (on different processors and different RAM banks).

The only thing I can think of, is that Apex is either working (or probably failing) in such an unusual way that this hardware issue is exposed and fails.

IMO even if it turns out that Apex is doing something catastrophically wrong, there is still a hardware issue there.

I really would speak to PCS at the earliest opportunity. Find out whether there is a BIOS update that might help.
 

ubuysa

The BSOD Doctor
AND while i'm on this, for that game to previously work just fine and only in the past few days begin crashing like this. It's so very specific!
That sounds even more like a hardware issue then. One that is likely getting worse.
 

ChrisCooney

Silver Level Poster
That's why I'd suggest looking for an Apex forum to ask.

Generally, if a badly formed system service call is made we'd get a more common BSOD, related to the nature of the invalid request.

A WHEA BSOD happens because the hardware failed somehow. In the three dumps of yours that's been a level 1 cache failure (on different processors and different RAM banks).

The only thing I can think of, is that Apex is either working (or probably failing) in such an unusual way that this hardware issue is exposed and fails.

IMO even if it turns out that Apex is doing something catastrophically wrong, there is still a hardware issue there.

I really would speak to PCS at the earliest opportunity. Find out whether there is a BIOS update that might help.
Okay, next thing to do is to speak with PCS then. Understood. I'll call them ASAP and report the issue. You mentioned before it sounds like a "timing" issue rather than something being wrong with the hardware itself? That is to say, an RMA wouldn't make much sense right now. Am I understanding that correctly?
 

ubuysa

The BSOD Doctor
Okay, next thing to do is to speak with PCS then. Understood. I'll call them ASAP and report the issue. You mentioned before it sounds like a "timing" issue rather than something being wrong with the hardware itself? That is to say, an RMA wouldn't make much sense right now. Am I understanding that correctly?
That's why I think checking for a new AGESA/BIOS update is the best next step.
 

ubuysa

The BSOD Doctor
Yep, understood - just wanted to make sure I'm not getting anything wrong.
If there isn't one or if an update doesn't help I would pressure PCS to RMA it.

Edit: it fails on two RAM sticks, so swap the two sticks over - just in case.
 
Last edited:
Top