Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ubuntu ISO with >4GB casper file freezes on boot through UEFI:NTFS when Secure Boot is active #2210

Closed
pbatard opened this issue Mar 29, 2023 · 4 comments
Assignees
Milestone

Comments

@pbatard
Copy link
Owner

pbatard commented Mar 29, 2023

This is basically an investigation of the issue that reported on askubuntu.com and that I have also been able to replicate on another system.

To cut a long story short:

  • Ubuntu Studio 22.04.2 LTS ISOs (ubuntustudio-22.04.2-dvd-amd64.iso) now uses a casper file (casper/filesystem.squashfs) that is larger than 4 GB.
  • Because of this, we can't use FAT32 with ISO mode but instead must switch to NTFS.
  • Because of this, we need to chain load the Ubuntu bootloaders (shim + mokmanager + GRUB) through UEFI:NTFS.
  • If Secure Boot is disabled everything is fine and you get to the GRUB menu.
  • If Secure Boot is enabled, and the embedded UEFI:NTFS read-only NTFS driver is used, shim + mokmanager + GRUB boot freezes (most likely in mmx64.efi but needs to be validated) and you never get to the GRUB menu.
  • If Secure Boot is enabled and an external firmware provided read/write NTFS driver is used (tested on an Intel NUC), shim + mokmanager + GRUB works as expected (NB: We obviously tested this with UEFI:NTFS being invoked, not with direct boot of the NTFS partition).
  • Oh and of course, when using DD mode, and therefore when the shim resides on a FAT partition, everything works (though you do get an Error: file /boot/ not found! message, that doesn't seem to have much of an impact).

So we have an incompatibility somewhere between the UEFI:NTFS bootloader or the UEFI:NTFS ntfs-3g read-only driver and one of shim/mokmanager/GRUB, with the current most likely suspect being mokmanager (which doesn't seem to be invoked when Secure Boot is disabled) shim (WTF?!?) possibly needing/expecting r/w access to the ESP with one of the code changes introduced in https://github.com/rhboot/shim/commits/main between 2021.07 and 2023.01.18.

Unfortunately, and unlike what UEFI:NTFS does with is very detailed and verbose output, the Red Hat/GRUB/Ubuntu folks borrowed the most patronizing page from Microsoft's rulebook that says "you should hide scary boot details from the user and give them a nice empty screen since it'll looks pretty" and ran with it, with the result that they chose not to provide a single point of information about what's currently happening that could give us any clue as to where their process chokes.

Which means that we now have to find a needle in the blind haystack of shim + mokmanager + GRUB to try to figure out what is really happening.

Things we tested:

  • ubuntu-22.10-desktop-amd64.iso (with casper < 4 GB) written in UEFI:NTFS mode boots fine under the same conditions
  • ubuntu-22.04.2-desktop-amd64.iso (with casper < 4 GB) written in UEFI:NTFS mode has the same issue
  • A quick comparison between the above shows that all of bootx64.efi, mmx64.efi and grubx64.efi are different, so it's possible that this is a known Shim/MOK Manager/GRUB issue that has been fixed in the most up to date versions, and that will be picked by LTS eventually.
  • Removed/renamed grubx64.efi and mmx64.efi → Still froze, which seems to indicate that the issue is with the shim (unless shim is designed to halt if it can't find MM/GRUB).
  • Replaced 22.04.02 LTS Shim with 22.10 Shim → This works. So, correlated with the above, it looks pretty safe to say that the issue is in the Linux Shim. However the thing that now worries me is that the Shim that doesn't work (937 KB) was signed ‎Wednesday ‎18 ‎January ‎2023 02:37:51 whereas the Shim that does work (933 KB) was signed ‎Thursday ‎12 ‎August ‎2021 22:00:22, which would tend to hint that it's the newer shims that introduced breakage and that future versions of Ubuntu will all have the issue. WTF did Red Hat do to break boot?!?
  • Tried a boot in DD mode to see if the ESP was altered but the ESP was identical before and after boot. This would tend to indicate, though it does not exclude it totally, that this isn't a rw vs ro issue...
  • Use a r/w version of our ntfs-3g driver → Same issue, so this validates that this is not a rw vs ro issue.
  • Use a different NTFS driver. Using the old (ro) GPLv3 NTFS driver from https://efi.akeo.ie/downloads/efifs-1.9/x64/ fixes the issue → Goddammit Microsoft, if you didn't bullshit the world and refuse to sign GPLv3 binaries, that's the driver we would use in the first place and we wouldn't be in this mess!
  • Use an MSCV/gnu-efi compiled version of the driver rather than a gcc/EDK2 one → Still fails. So this is not a toolchain issue...
  • Enable NTFS debug in our driver to try to see what is being accessed when the freezout occurs:
    snapshot_00 01 646
  • Hmmm, so we fail in ntsf_readdir() most likely after the ntfs_attr_open() call and in a code secton triggering a goto err_out; jump but not on a goto dir_err_out; jump, since the latter would override the E2BIG errno we get to EIO instead. On that subject, I can't locate any part of the ntfs-3g code that would explicitly return E2BIG, so it looks like this error code is being returned from the if (HookData->Info->Size < ((UINT64)NameLen + 1) * sizeof(CHAR16)) check in DirHook().
  • Yup, that's where we choke. The RH shim is issuing a Read() of the directory with a 0 sized buffer and this is throwing our driver off:
    Img_1346
  • Looking at the UEFI specs for EFI_FILE_PROTOCOL.Read() (Section 13.5 File Protocol) it seems that our issue is that we are returning 0 for the size, whereas we should return the minimum required size, per:

    If This is a directory, the function reads the directory entry at the file’s current position and returns the entry in Buffer. If the Buffer is not large enough to hold the current directory entry, then EFI_BUFFER_TOO_SMALL is returned and (...) BufferSize is set to be the size of the buffer needed to read the entry.

  • So it would look the problem is that the shim is expecting the driver to return the minimum required size to read the buffer (per specs) and adjusting its request to use the returned size until it gets a successful read, but since our NTFS driver is returning 0 instead of the required size (non specs compliant), whatever loop the shim uses to read the directory loops forever.
  • Aaaand, this is the same issues as the one reported in Don't loop forever in load_certs() with buggy firmware rhboot/shim#547, for which Red Hat have now applied a workaround. Well, at last now we know what the root of the issue is and what's required to fix it... This could also explain some of the issues reported by folks using Dell computers with their UEFI firmware freezing when a UEFI:NTFS drive is plugged...
@pbatard
Copy link
Owner Author

pbatard commented Mar 29, 2023

As referenced above, I also reported this issue in rhboot/shim#558 as there is circumstantial evidence of a possible Linux shim regression...

@pbatard
Copy link
Owner Author

pbatard commented Mar 30, 2023

Opened pbatard/ntfs-3g#4 to fix the underlying issue.

@pbatard
Copy link
Owner Author

pbatard commented Mar 31, 2023

While we were at it, we also submitted a PR to improve the Shim code (that currently just bails out on non-compliance, but could try to allocate buffers with increased size, thus ensuring that the directory listing succeeds regardless).

@pbatard pbatard modified the milestones: 3.24, 3.23 Apr 15, 2023
@github-actions
Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue if you think you have a related problem or query.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant