Saturday, May 30, 2020

Random SSD failure with Windows 10


After migrating to Windows 10 one of the SSDs in the system would randomly fail.  This article logs the issue for reference.

Motherboard
Asus Z87M-PLUS
IDE ATA/ATAPI controller reported by Windows device manager:
Intel(R) 8 series/C220 Chipset Family SATA AHCI Controller

Samsung 850 EVO MZ-75E500RW 500Gb SSD connected to SATA port 2 disk 0
Samsung 840 EVO MZ-7TE500BW 500Gb SSD connected to SATA port 3 disk 1
Samsung HD204UI 2Tb HDD connected to SATA port 5 disk 2

Under Windows 8.1 all disks functioned normally with no special settings or drivers in use.  Where hardware works sufficiently with default drivers and settings, I will leave it alone unless there is a problem.  Initially Windows 8.1 was running on the 840 EVO SSD.  Later Windows 10 was installed on the 850 EVO and configured to dual boot.  In this way either Windows 10 could be used on 850 EVO or Windows 8.1 on the 840 EVO.

It was observed that while there were no issues accessing any disks under Windows 8.1, with Windows 10 the EVO 840 would seem to randomly deny access.  Sometimes it would work, but after a while there was no access until a full shutdown - start cycle.

Later on Windows 10 on the EVO 850 became the main operating system used and Windows 8.1 was removed from the EVO 840 drive to be used for additional storage.  The problems accessing the EVO 840 drive persisted, leading to a very long troubleshooting process.  How could simply changing the OS to Windows 10 cause such serious problems? I was fairly sure it could be rectified with the correct driver/settings configuration.  I was wrong.

To begin with, a pattern was identified of drive disconnection at exactly 11minutes 10seconds after System Event ID 12 Kernel-General "operating system started".  The disconnection was signified by a tell tail event ID 129 storahci "Reset to device, \Device\RaidPort0, was issued".  Once that occurred, the EVO 840 was useless until the shutdown - start cycle.

At this point I wondered if the drive was the problem.  In the past, it had completely filled up and data loss and other generally bad behaviour had occurred.  Although I had not used it before, I thought this would be a good time to use the Samsung Magician software to keep an eye on the Samsung drives.  The Samsung Magician confirmed that the firmware was current.  I used this software to Secure Erase the EVO 840.  No effect.

Attention then turned to the controller drivers especially as the Magician software flagged the driver as
"Under the current system environment, some functions in Magician CANNOT be run.  To enable these Magician functions and ensure optimal performance it is recommended to: Visit website below and updated driver to the latest version."  Always good advice - maybe.  This was the intel site for the Intel Rapid Storage Technology (RST) interface and driver.  I was not convinced this was really the best place get a driver for the controller in the motherboard especially as the latest version did not include my motherboard chipset in the supported devices list.  Next stop, the motherboard manufacturer support website, and sure enough, there was the SATA driver for Windows 10.

This installed the motherboard manufacturer recommended version of Intel RST for Windows 10 and it stopped the 11minute 10second disconnection problem.  Unfortunately it did not resolve the issue completely.  Once the EVO 840 was bought into use sooner or later it would disconnect.  Initially I configured a swap file on it which had the amusing effect of BSODing the whole system when the EVO 840 disconnected and the OS couldn't see its swap file anymore.  The Event ID source was now iaStorA because the driver had changed.

BIOS version ? OK worth a try. So that was upgraded to the latest version. No effect.

At this point the BIOS, SSD firmware, and controller drivers were all correct so configuration became the next focus.  This article looked like a winner.  Deep dive troubleshooting of the "Reset to device, \Device\RaidPort0, Event 129" problem.  Power management is always a suspect.  Nothing here solved the problem.

While the EVO 840 drive was not used it would happily remain available.  Once it was asked to do any real work it would fall quickly on its bum.

Another big indicator that this was a Windows driver problem, emerged from the behaviour of Acronis backup/restore software.  If a restore was attempted to the EVO 840 drive with bootable  WinPE based software, it would always randomly fail.  The same attempts under a linux based version would have no problems.

This article looked interesting
https://docs.microsoft.com/en-gb/archive/blogs/ntdebugging/understanding-storage-timeouts-and-event-129-errors
As it indicated this registry value could have an effect HKLM\System\CurrentControlSet\Services\Disk\TimeOutValue I doubled the value from 41 to 82 but this did not solve the problem.

Finally because there were no problems when running from Windows 8.1, I restored it back to the EVO 840 drive and booted back into that to extract the original Windows 8.1 storahci driver.  I then forced this driver into Windows 10 but sadly it did not resolve it.

I have to conclude, that with this particular setup, it is not possible to resolve this issue.

A 1Tb EVO 860 was bought and installed.  The Windows 10 partitions were restored to it from a backup of the Windows 10 install on the EVO 850.  The EVO 850 was then used as additional storage and the paging file moved to it.  Result: Windows 10 nice and quick.

Perhaps the EVO 840 will live again in the future connected to an updated motherboard / disk controller.


Links to related and similar reports:

https://answers.microsoft.com/en-us/windows/forum/windows_10-hardware-winpc/event-id-129-storahci-resetting-raidport0/7b30c512-6597-438b-80cb-22fb2f85d62e?page=3

https://answers.microsoft.com/en-us/windows/forum/windows_10-hardware/ssd-samsung-850-pro-512gb-freezes-randomly-with/9a8178d0-c72d-49f1-9ea7-da01a24eea41?page=2

https://answers.microsoft.com/en-us/windows/forum/windows_10-update/random-freeze-event-129-iastora/142abe12-3eaf-4e51-8d50-22e28a48e186

Wednesday, May 27, 2020

Restoring UEFI OS to a second HDD and dual booting


The starting configuration is a UEFI booting OS on say, disk 0.  The requirement is to restore a backedup UEFI OS on a separate HDD/SSD say disk 1 and enable dual booting.

Restore the backup to disk 1.

You've gone into Advanced Start up of the OS on disk 0 (Troubleshoot_Advanced Options_Command Prompt) and used bootrec /scanos to locate the restored OS on disk 1.  You've tried bootrec /rebuildbcd to add the newly restored OS to the boot configuration database (BCD).

bootrec /rebuildbcd  locates the OS on disk 1 and offers to sort it out.

Successfully scanned Windows installations.
Total identified windows installations: 1
[1]   D:\Windows
Add installation to boot list? Yes(Y)/No(No)/All(A):

You select Y thinking your task is nearly complete but instead of success you get

The system cannot find the path specified.

Exit and reboot normally back into the OS on disk 0.

Instead, you will need to manually add the entry to the BCD using bcdedit in an elevated command window.

Run CMD.EXE as admin and enter bcdedit to see the existing entries.  You probably have a Windows Boot Manager and Windows Boot Loader section which controls booting into the OS on disk 0. 

You need to add another Windows Boot Loader entry for the restored OS on disk 1.

The existing entry in the list is referred to as {current}

Make a copy of it using this command.  This using the /d switch to label the entry for it's appearance on the dual boot screen.  Below I have used Windows 8.1

bcdedit /copy {current} /d "Windows 8.1"

Now run bcdedit again to see the result.  You have a another Windows Boot Loader entry with a description of Windows 8.1.  Notice that an arbitary GUID identifier has automatically been generated and applied to it.  You need to copy the GUID into the commands below to identify the Windows Boot Loader entry you are modifying

Now customise it for the OS on disk 1

Bcdedit /set {copiedGUID} device partition=d:
Bcdedit /set {copiedGUID} osdevice partition=d:

Finally modify the Windows Boot Manager settings so that the dual boot selection screen appears when the computer is booted.

Bcdedit /set {bootmgr} Displaybootmenu yes

Bcdedit /set {bootmgr} timeout 20

Run bcdedit again to review how the settings have been updated.

Now reboot and select the restored OS in the list to boot into the restored OS on disk 1.