MSI, App-V, SCCM, Appsense, Citrix. This blog contains hints and tips on these technologies. Primarily it is an online notebook of items that I may need to refer to in the future, or things I constantly forget!
Saturday, May 30, 2020
Random SSD failure with Windows 10
After migrating to Windows 10 one of the SSDs in the system would randomly fail. This article logs the issue for reference.
Motherboard
Asus Z87M-PLUS
IDE ATA/ATAPI controller reported by Windows device manager:
Intel(R) 8 series/C220 Chipset Family SATA AHCI Controller
Samsung 850 EVO MZ-75E500RW 500Gb SSD connected to SATA port 2 disk 0
Samsung 840 EVO MZ-7TE500BW 500Gb SSD connected to SATA port 3 disk 1
Samsung HD204UI 2Tb HDD connected to SATA port 5 disk 2
Under Windows 8.1 all disks functioned normally with no special settings or drivers in use. Where hardware works sufficiently with default drivers and settings, I will leave it alone unless there is a problem. Initially Windows 8.1 was running on the 840 EVO SSD. Later Windows 10 was installed on the 850 EVO and configured to dual boot. In this way either Windows 10 could be used on 850 EVO or Windows 8.1 on the 840 EVO.
It was observed that while there were no issues accessing any disks under Windows 8.1, with Windows 10 the EVO 840 would seem to randomly deny access. Sometimes it would work, but after a while there was no access until a full shutdown - start cycle.
Later on Windows 10 on the EVO 850 became the main operating system used and Windows 8.1 was removed from the EVO 840 drive to be used for additional storage. The problems accessing the EVO 840 drive persisted, leading to a very long troubleshooting process. How could simply changing the OS to Windows 10 cause such serious problems? I was fairly sure it could be rectified with the correct driver/settings configuration. I was wrong.
To begin with, a pattern was identified of drive disconnection at exactly 11minutes 10seconds after System Event ID 12 Kernel-General "operating system started". The disconnection was signified by a tell tail event ID 129 storahci "Reset to device, \Device\RaidPort0, was issued". Once that occurred, the EVO 840 was useless until the shutdown - start cycle.
At this point I wondered if the drive was the problem. In the past, it had completely filled up and data loss and other generally bad behaviour had occurred. Although I had not used it before, I thought this would be a good time to use the Samsung Magician software to keep an eye on the Samsung drives. The Samsung Magician confirmed that the firmware was current. I used this software to Secure Erase the EVO 840. No effect.
Attention then turned to the controller drivers especially as the Magician software flagged the driver as
"Under the current system environment, some functions in Magician CANNOT be run. To enable these Magician functions and ensure optimal performance it is recommended to: Visit website below and updated driver to the latest version." Always good advice - maybe. This was the intel site for the Intel Rapid Storage Technology (RST) interface and driver. I was not convinced this was really the best place get a driver for the controller in the motherboard especially as the latest version did not include my motherboard chipset in the supported devices list. Next stop, the motherboard manufacturer support website, and sure enough, there was the SATA driver for Windows 10.
This installed the motherboard manufacturer recommended version of Intel RST for Windows 10 and it stopped the 11minute 10second disconnection problem. Unfortunately it did not resolve the issue completely. Once the EVO 840 was bought into use sooner or later it would disconnect. Initially I configured a swap file on it which had the amusing effect of BSODing the whole system when the EVO 840 disconnected and the OS couldn't see its swap file anymore. The Event ID source was now iaStorA because the driver had changed.
BIOS version ? OK worth a try. So that was upgraded to the latest version. No effect.
At this point the BIOS, SSD firmware, and controller drivers were all correct so configuration became the next focus. This article looked like a winner. Deep dive troubleshooting of the "Reset to device, \Device\RaidPort0, Event 129" problem. Power management is always a suspect. Nothing here solved the problem.
While the EVO 840 drive was not used it would happily remain available. Once it was asked to do any real work it would fall quickly on its bum.
Another big indicator that this was a Windows driver problem, emerged from the behaviour of Acronis backup/restore software. If a restore was attempted to the EVO 840 drive with bootable WinPE based software, it would always randomly fail. The same attempts under a linux based version would have no problems.
This article looked interesting
https://docs.microsoft.com/en-gb/archive/blogs/ntdebugging/understanding-storage-timeouts-and-event-129-errors
As it indicated this registry value could have an effect HKLM\System\CurrentControlSet\Services\Disk\TimeOutValue I doubled the value from 41 to 82 but this did not solve the problem.
Finally because there were no problems when running from Windows 8.1, I restored it back to the EVO 840 drive and booted back into that to extract the original Windows 8.1 storahci driver. I then forced this driver into Windows 10 but sadly it did not resolve it.
I have to conclude, that with this particular setup, it is not possible to resolve this issue.
A 1Tb EVO 860 was bought and installed. The Windows 10 partitions were restored to it from a backup of the Windows 10 install on the EVO 850. The EVO 850 was then used as additional storage and the paging file moved to it. Result: Windows 10 nice and quick.
Perhaps the EVO 840 will live again in the future connected to an updated motherboard / disk controller.
Links to related and similar reports:
https://answers.microsoft.com/en-us/windows/forum/windows_10-hardware-winpc/event-id-129-storahci-resetting-raidport0/7b30c512-6597-438b-80cb-22fb2f85d62e?page=3
https://answers.microsoft.com/en-us/windows/forum/windows_10-hardware/ssd-samsung-850-pro-512gb-freezes-randomly-with/9a8178d0-c72d-49f1-9ea7-da01a24eea41?page=2
https://answers.microsoft.com/en-us/windows/forum/windows_10-update/random-freeze-event-129-iastora/142abe12-3eaf-4e51-8d50-22e28a48e186