Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible regression: 2021.08.12 shim boots whereas 2023.01.18 freezes in a chain-loaded Secure Boot context #558

Closed
pbatard opened this issue Mar 29, 2023 · 4 comments

Comments

@pbatard
Copy link
Contributor

pbatard commented Mar 29, 2023

Issue description

It appears that, when running in a Secure Boot environment, the current shim introduced a regression, that is currently manifesting itself and affecting users trying to create NTFS-based boot media from Ubuntu Studio 22.04.2 LTS (this was tested with ubuntustudio-22.04.2-dvd-amd64.iso) when using the NTFS chain loader from the Rufus boot media creation utility.

Basically, because the Ubuntu ISO contains a file that is larger than 4 GB, Rufus extracts the ISO content onto an NTFS partition rather than a FAT32 partition (since FAT cannot accommodate files larger than 4 GB) and then, for UEFI systems that do not have a native NTFS driver, chains load the Red Hat Shim from a small additional FAT partition that contains a UEFI boot loader that can load a UEFI NTFS driver and then hands of boot to the NTFS-residing bootloaders.

This chain loader, called UEFI:NTFS is itself Secure Boot signed, along with its read-only NTFS driver that is derived from ntfs-3g. Note that this driver was restricted to read-only as a means to improve its resilience against potential vulnerabilities and therefore facilitate the Secure Boot validation process.

However, we have recently found out that, when Secure Boot is active (the issue does not manifest itself when Secure Boot is Disabled or in Setup mode), the process above appears to freeze during shim execution when using the 2023.01.18 Secure Boot signed version. Yet, when replacing the shim with a 2021.08.12 Secure Boot signed version, everything works as expected.

Thus, it would appear that a regression was introduced between 2021.07 and 2023.01.18 that causes a freezout when shim is chain loaded from a read-only NTFS partition.

Steps to replicate

The attached shim.zip contains a 8 MB uncompressed VHD disk image, which, for all intents and purposes will work the same as a dd disk image (since uncompressed VHD = regular disk image + 512 bytes footer), that you should be able to use with any USB disk media to recreate a test disk that includes:

  • A 6 MB NTFS partition with both the 2023.01.18 and 2021.08.12 shim bootloaders (with the 2023 version being labelled as /efi/boot/bootx64.efi, and therefore the one that will execute by default, and the 2021 version labelled as /efi/boot/bootx64.old, and provided for testing purposes).
  • A 1 MB FAT12 partition with the UEFI:NTFS bootloaders and drivers

Thus, if you dd write that image, you will end up with a GPT partitioned bootable drive can be used to replicate the issue.

From there, you should to test this drive on a UEFI based system that meets the following two conditions:

  1. Has Secure Boot enabled.
  2. Does NOT natively include a NTFS driver (more about this below).

On such a system, and provided that your UEFI firmware isn't pedantic about the fact that the backup GPT of the drive you just created will be invalid (which most firmwares will happily ignore), you should end up with a screen similar to the one below, indicating, after providing some info about your system, first, that the NTFS driver was successfully loaded, then that the shim was successfully located on the NTFS partition and finally that the UEFI boot process has successfully been handed over to it:

C4ZEh

Then, provided that you can mount the NTFS partition from the drive you created, if you rename /efi/boot/bootx64.old to /efi/boot/bootx64.efi, you will find that the process does go past the screen above up to the stage where shim reports, as expected if the execution is working, that it could not locate grubx64.efi, which tend to support the idea that we are, in effect, dealing with a shim regression.

Note that the shim bootloaders used in this test image come from ubuntustudio-22.04.2-dvd-amd64.iso and ubuntu-22.10-desktop-amd64.iso, with the older one actually being from 22.10.

Investigation notes

  • Since we have been able to replicate this issue on two separate PCs (a rather recent(?) Dell Vostro 15 and a somewhat old Gigabyte based AMD system), we are currently assuming that this is likely to affect all UEFI-based PC users where the UEFI firmware does not provide a native NTFS driver.
  • On systems where the UEFI firmware does provide a native UEFI driver (currently we have only tested on an Intel NUC, but recent motherboards from Gigabyte or Asus are also known to come with one), it appears that the issue does not manifest itself. If you test it there, you will see that the output from the UEFI:NTFS bootloader reports that it will not install its own driver but reuse the one from the system before the shim produces the expected messages. And of course, the problem also does not manifest itself when booting directly from FAT32. So, it does appear that the underlying driver being used to access the file system also plays a role in triggering the issue, and we are of course investigating this. The obvious suspicion, since the main difference between most native UEFI firmware drivers and the one use by UEFI:NTFS is that the formers are r/w and the latter is ro, would hint at a shim code change that might attempt to access the ESP with write access during boot. We have however seen no evidence of this yet... And obviously, while we do have other UEFI NTFS drivers we could test with, and are planning to do just that, the fact that the issue only manifests itself when Secure Boot is enabled makes it a lot more cumbersome to investigate properly. It should also be pointed out that the NTFS driver used was designed with UEFI compliance in mind and is derived from the well established and stable ntfs-3g codebase, so, while it is of course very possible that the issue, or at least part of it, resides with the driver, especially considering that using a different proprietary driver appears to "solve" the problem (yet this can't be used as a workaround, since there is no public Secure Boot signed proprietary NTFS driver available for general use), one might want to be careful about trying to dismiss the problem as a pure NTFS driver issue, even more so as UEFI:NTFS is widely used to chain-load Windows bootloaders and does so without trouble.
  • Along with further investigation with the NTFS driver, we are obviously also in the process of trying to refining the time range where this potential shim regression might have been introduced, but, this means trying to dig up Secure Boot signed versions of x86_64 shim binaries, so one question we have is: Is there a public repository of Secure-Boot signed shims that one can download binaries from? Ideally, I would strongly suggest that every time Red Hat gets a shim Secure Boot signed, they upload it as a GitHub release in this repository, like what's done for the UEFI:NTFS bootloader code as well as the UEFI:NTFS driver, as it would probably help people wondering where one can obtained past version of the signed shims...
  • Since we are still in the process of investigating what looks like a time-consumming issue to troubleshoot, we are quite eager to obtain insight from the shim developers as to whether they have some idea of what could have changed with the shim in the past 2 years, especially in the context of file I/O, that could explain the change of behaviour. This is also part of the reason why we are reporting this issue early and, at this stage, any hint would be welcome
  • For reference, the driver loading and chain loading code is pretty straightforward and can be found at https://github.com/pbatard/uefi-ntfs/blob/master/boot.c. Especially, the actual shim handover boils down to executing LoadImage()/StartImage() (since Secure Boot validation is handled by LoadImage()) and likewise, the code to load the driver and locate the bootloader on the NTFS target partition is relatively straightforward. In short, while we have questions above the NTFS driver code, we don't currently believe that the issue has to do with the UEFI:NTFS bootloader itself.
  • As pointed above, all the code being used to chain load the shim is Open Source (See https://github.com/pbatard/uefi-ntfs and https://github.com/pbatard/ntfs-3g) along with the UEFI toolchain environments (Linux/Ubuntu using gcc and the latest EDK2 at the time of release, through GitHub Actions) and, since it was Secure Boot signed, has been vetted by Microsoft.
  • Finally, still for reference, the original issue was reported at https://askubuntu.com/questions/1461218/booting-from-the-flash-ubuntu-studio-22-04-created-with-rufus-on-dell-vostro-1 with subsequent investigation being carried out in Ubuntu ISO with >4GB casper file freezes on boot through UEFI:NTFS when Secure Boot is active pbatard/rufus#2210.
@pbatard
Copy link
Contributor Author

pbatard commented Mar 30, 2023

Further investigation has allowed me to conclude that this is a duplicate of the issue reported in #547, and especially the problem is that our NTFS driver is not specs compliant when it comes to EFI_FILE_PROTOCOL.Read() that states (in Section 13.5 File Protocol):

If This is a directory, the function reads the directory entry at the file’s current position and returns the entry in Buffer. If the Buffer is not large enough to hold the current directory entry, then EFI_BUFFER_TOO_SMALL is returned and (...) BufferSize is set to be the size of the buffer needed to read the entry.

As we were able to confirm, the root of the issue is that, when reading the efi/boot/ directory with BufferSize set to 0, the NTFS driver is returning 0 instead of the size of the buffer needed to read the entry, which leads pre 2023.02.01 Shim code to enter an infinite loop.

Obviously, we will fix the NTFS driver for compliance, to avoid the issue altogether, but it's nice to see that Shim has also taken steps to try to alleviate it.

pbatard added a commit to pbatard/shim that referenced this issue Mar 31, 2023
Following the discovery of more problematic firmwares and drivers
affected by the issue c28b9855d0befbd07f46ced5c06c6d799c9d9d00 is
designed to address (e.g. rhboot#558),
this patch further improves the code so that, instead of simply bailing
out, we progressively increase the buffer sizes, until either success
or a maximum size limit is reached.

In most cases, this workaround should be enough to ensure completion
of the directory read and thus provide full shim functionality (while
still warning the user about the non-compliance of their environment).
pbatard added a commit to pbatard/shim that referenced this issue Mar 31, 2023
Following the discovery of more problematic firmwares and drivers
affected by the issue c28b9855d0befbd07f46ced5c06c6d799c9d9d00 is
designed to address (e.g. rhboot#558),
this patch further improves the code so that, instead of simply bailing
out, we progressively increase the buffer sizes, until either success
or a maximum size limit is reached.

In most cases, this workaround should be enough to ensure completion
of the directory read and thus provide full shim functionality (while
still warning the user about the non-compliance of their environment).

Signed-off-by: Pete Batard <pete@akeo.ie>
pbatard added a commit to pbatard/shim that referenced this issue Mar 31, 2023
Following the discovery of more problematic firmwares and drivers
affected by the issue f23883c is
designed to address (e.g. rhboot#558),
this patch further improves the code so that, instead of simply bailing
out, we progressively increase the buffer sizes, until either success
or a maximum size limit is reached.

In most cases, this workaround should be enough to ensure completion
of the directory read and thus provide full shim functionality (while
still warning the user about the non-compliance of their environment).

Signed-off-by: Pete Batard <pete@akeo.ie>
@pbatard
Copy link
Contributor Author

pbatard commented Mar 31, 2023

I have now confirmed that the latest shim (with the f23883c / #547 patch applied) does bail out when using the problematic driver instead of entering an infinite loop.

However, since I believe that the code from f23883c can be improved to progressively increase the buffer size in order to successfully complete the directory read, I have now submitted PR #560 that does just this.

vathpela pushed a commit that referenced this issue May 2, 2023
Following the discovery of more problematic firmwares and drivers
affected by the issue f23883c is
designed to address (e.g. #558),
this patch further improves the code so that, instead of simply bailing
out, we progressively increase the buffer sizes, until either success
or a maximum size limit is reached.

In most cases, this workaround should be enough to ensure completion
of the directory read and thus provide full shim functionality (while
still warning the user about the non-compliance of their environment).

Signed-off-by: Pete Batard <pete@akeo.ie>
@vathpela
Copy link
Contributor

Can I close this?

@pbatard
Copy link
Contributor Author

pbatard commented Jul 26, 2023

I will close this now, yes.

The issue was fixed in the NTFS driver and with the commit that was integrated in Shim, there is no reason to have this issue opened.

Thanks again!

@pbatard pbatard closed this as completed Jul 26, 2023
brianredbeard pushed a commit to brianredbeard/redhat-efi-boot-shim that referenced this issue Feb 22, 2024
Following the discovery of more problematic firmwares and drivers
affected by the issue f23883c is
designed to address (e.g. rhboot#558),
this patch further improves the code so that, instead of simply bailing
out, we progressively increase the buffer sizes, until either success
or a maximum size limit is reached.

In most cases, this workaround should be enough to ensure completion
of the directory read and thus provide full shim functionality (while
still warning the user about the non-compliance of their environment).

Signed-off-by: Pete Batard <pete@akeo.ie>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants