Rare thirty-seconds OS freezes related to NVMe SSD

I’m running Win11 Pro on a new 9970X CPU + ASUS TRX50 (older non-A variant) board with a Samsung 9100 Pro SSD and a SuperFlower Leadex Platinum ATX 3.1 1000w PSU.

Very rarely the system will freeze for almost exactly thirty seconds then go back to normal.

Upon some web-searching and looking at event viewer, there’s event IDs (forgot the numbers) related to NVMe (“stornvme.sys” IIRC) saying something like “write blocked/failed” about the same time the issue occurs.

Searching for the issue, some similar reports online suggesting defect with either the board, SSD (firmware issue or just defect) or in some cases PSU. Suggested workaround to enable a Windows power saving related setting via registry edit something to do with “AHCI sleep” which I tried and it didn’t help.

Does anyone have any pointers or suggestions about this issue? Or should I start replacing the parts one by one?

Side note that my board is really old inventory (manufactured over a year ago!) which makes me want to replace it first before the other parts.

Thoughts?

Oh what about any BIOS settings to try? There’s some power related option IIRC like “idle current typical” or “idle blah” gosh I should really look it up and write it properly here…

I have also been experiencing something similar on my pc, its not 30 seconds exact every time like in your case though. I haven’t been able to figure out what’s causing it

Do you see a similar log in the event viewer? Related to “stornvme” or “storahci” or “raid/disk0” “write timeout” etc? I’m going to try to trigger it more often if I can by running multiple disk operations (torrents, code compilations etc) for long durations, then if I can repro, will start by changing BIOS setting to “idle power current typical” wherever that is. then if it still happens, will have to think about next options.

Honestly it happens like twice a week so I didn’t really look into it much. The next time it happens I’ll be sure to check event viewer

1 Like

What is your computer specs/config?

How does the safe mode behaves?

Tried installing nvme drive drivers from the manufacturer website?

You might want to tweak the default Power and sleep setting options to max/high performance mode yet toggling options to ensure the storage doesnt go in sleep mode etc.

1 Like

Yup there are many things I can try before starting to replace parts but one hurdle is I can’t trigger the problem on demand, so if I try some setting change then need to wait to see if issue recurs that too if it doesn’t then don’t even know if issue solved or just hasn’t happened again yet.

Could it be due to something like this: The ASUS Gaming Laptop ACPI Firmware Bug: A Deep Technical Investigation?

1 Like

5700x3d, x570 tuf gaming plus, 4080 super, mwe 750 v2 gold, kc3000 2tb for windows boot drive. 980 pro 2tb, 870 qvo 2tb, 860 evo 256mb, a 1tb hdd and a 2 tb hdd

1 Like

Please post your errors from Event Viewer as it is.

1 Like

I’ll try to remember next time it happens :saluting_face:

I bet the issue doesn’t occur when you are actively using the system but only when you leave it idle either locked or on the desktop.

I was running in a similar type of situation where I was facing bsod if I lock the system or stay idle for a min or two or a random period.

Evey bsod pointed to a different hardware or drivers- storage, nvidia drivers, monitor, asus etc (but not to any audio) and much more. I immediately realized that no component is at fault in real and it just that the windows going bonkers around!

Rebooted into safe mode, performed uninstalleation of chipset drivers, nvidia drivers and so on.

Did a system restore back to some 1.5yrs.

Fired a few sfc/scan and dism commands. This all took not more than 2 hrs in total.

Lastly, windows is working great again.

Else I had already installed WIn11 on a brand new ssd and the ssd is still lying in my drawer ready to be plugged and if even my current Win10 again tosses up!

My kc3000 (windows ssd) has a weird issue where I have to refresh the explorer window after I delete something. If I don’t refresh the folder/file keeps showing there.

Also my secondary 980 pro got fucked due to samsung buggy fw a couple years ago. But its been working fine as a secondary ssd, only issue that crops up is If I copy something large to it at once the drive shows as disconnected (and sometimes some files get unreadable/corrupt), I have to do a shutdown to get it to show again. Can the secondary ssd having an issue cause issue in windows?

EDIT-Also I’ve been thinking of getting a new ssd anyways, which 4tb ssd would you recommend as a main system drive

Thanks for responses, but my issue occurred a couple of times, once during normal usage (browsing etc, actually right while closing a chrome window with many tabs) and other time while in the middle of busy usage (building my software projects in multiple simultaneous runs) nothing during idle really.

As for 4tb ssd recommendations, I guess you can open a separate thread for everyone to chime in, personally I’ve always preferred samsung and crucial brands, I avoid others even though they’re popular like even WD etc, can’t really say why, just my gut tells me to stay away haha.

I have a few samsung and crucial ssds (SATA 2.5 inch and m.2 nvme pcie) over the years, all working perfectly fine, touchwood.

1 Like

I have two systems running Windows 10 where I have faced similar issue to you - freezing and then in a few seconds BSOD.

My first system https://imgur.com/a/sff-itx-gaming-pc-amd-6600xt-8gb-super-slim-10-ltr-e56yxRR has an old MB with an nvme slot + sata slots, both occupied by a nvme and 2.5” sata ssd. nvme being the boot drive and jsut like you I’ll get freezing occasionally. Coincidently, I thought of trying m.2 sata instead of nvme and the issue has never reoccured even once.

My second system https://imgur.com/a/minipc-egpu-setups-rZ0tbfw is a MiniPC + ePGU system. It had a Gen 4 nvme drive and I would see BSODs once a week randomly. For several months, searching for solution on reddit etc, everyone said its just that eGPU systems over oculink can be finicky so I left it that. Just conincidently, replaced the Gen 4 with an old poor health Toshiba gen3 nvme last month, and touchwood, I haven’t seen a single BSOD.

In both cases I downgraded the generation of the drives. The drives were not at fault, the generation was.

1 Like

Interesting, my issue is a 30-second hang then it recovers back to normal and I see those event viewer logs - do you also see any logs in event viewer during the time period the issue occurs?

Also, I have disabled “CPU watchdog timer” option BIOS so that if there is an issue the OS should hang/freeze and not automatically reboot which is what this option is meant for I believe.

No BSODs at all for me though.

edit: also, no such issue on my other PC with a gen5 crucial T700 nvme and of course different component specs/models…

First thing to do incase of Issues with SSDs is to check firmware number and see if these are known problems. NExt you figure out whether there is a new firmware. And again check reddit if this firmware resolves those issues.

Before you update, check the SMART status, and backup all your data ASAP.

A shortcut would be to check latest firmware and if it has any known issues. Lot of SSDs have firmware issues that cause delayed IO. Samsung is known to patch it quickly since most of their SSDs have their own controllers. Try it and let us know.

1 Like

iirc the event viewer logs were very vague, I remember googling them but not getting anywhere. They were not related anything to the ssd though that I recall. For me, there was never a recovery after the hang, it was always BSOD. I wonder Gen 5 is so new so the drivers are on the way to maturity.. who knows.

1 Like

Yeah, no shit, Sherlock. (as the saying goes)

Got any suggestions to reliably repro the issue? Right now it happens very rarely not to mention I also don’t use the PC all day every day making it even harder to methodically address this issue.

So yeah like you say SSD is brand new model so it’s on it’s first and latest firmware already.

Motherboard is on the first BIOS update for the new CPU so that’s another one to look forward to getting updates.

Finally, I do have some next-steps in mind (while waiting for firmware updates) but want to do one thing at a time to see what is it that resolves (or works around) the issue, again, painfully slow unless I find out how to trigger the issue reliably on-demand.

Thanks again guys for your inputs!