Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework BLS fragment writing to do "two phase commit" #1951

Open
cgwalters opened this issue Oct 21, 2019 · 8 comments
Open

Rework BLS fragment writing to do "two phase commit" #1951

cgwalters opened this issue Oct 21, 2019 · 8 comments
Labels
difficulty/hard hard complexity/difficutly issue enhancement reward/high Fixing this will result in significant benefit triaged This issue has been evaluated and is valid

Comments

@cgwalters
Copy link
Member

cgwalters commented Oct 21, 2019

Currently we require /boot/loader be a symbolic link, so that we can transactionally replace all of the entries. This causes various problems because it's an OSTree-specific invention.

Another approach would be to use "journaling", something like this:

  • First, write /boot/loader/.ostree-txn which would be a single file containing the new list of OSTree-specific bootloader entries (i.e. things starting with ostree- today)

Completion:

  • If /boot/loader/.ostree-txn exists, add all bootloader entries referenced by it, then delete all unreferenced ones
  • unlink(/boot/loader.ostree-txn)

Now if we're interrupted between these steps, we can add a systemd unit which does ConditionPathExists=/boot/loader/.ostree-txn and does the work on bootup. If we also detect that this would have changed the default boot entry, then we reboot.

@AdrianVovk
Copy link

Cool. This'll solve the sd-boot problem.

Would it be possible for OSTree to also be able to operate directly on /efi (systemd's preferred mount point for the ESP)

@cgwalters
Copy link
Member Author

Also just recording some other thoughts here. In some cases, we end up writing kernel+initramfs to FAT. And in general there's been longstanding issues with FS journaling versus bootloaders; see #1049

I think we should create a higher level "protocol" between things writing data in /boot and the ESP and the bootloaders. For example, we could have separate "checksum files" for the BLS fragments, kernel/initramfs etc. Something like this:

/boot/loader/entries/foo.conf
/boot/loader/entries/foo.conf.sha256
/boot/vmlinuz-1
/boot/vmlinuz-1.sha256
/boot/vmlinuz-2
/boot/vmlinuz-2.sha256
/boot/initramfs-1
/boot/initramfs-1.sha256

etc. And the bootloader would validate these before trying to boot a particular entry. Or it'd at least verify the BLS fragment checksum since that should be basically free.

Things writing kernels/initramfs to /boot should try to ensure there's at least one valid bootable entry, and the filesystem is fully sync'd before deleting things.

@cgwalters
Copy link
Member Author

xref https://marc.info/?l=linux-fsdevel&m=157168785821373&w=2
(I replied in that thread also linking to this issue)

@bam80
Copy link

bam80 commented Feb 21, 2020

What is the status of this? @cgwalters

@damianatorrpm
Copy link

This would also fix systemd-boot usage in Silverblue which uses /boot/loader/entries/ not /boot/loader.0/
@cgwalters Any chance this will happen soon :\ ?

@martinezjavier
Copy link
Contributor

I think we should create a higher level "protocol" between things writing data in /boot and the ESP and the bootloaders

I believe that if that's the approach chosen then this protocol should be agreed with systemd and be properly described in the https://systemd.io/BOOT_LOADER_SPECIFICATION/ document.

@martinezjavier
Copy link
Contributor

Would it be possible for OSTree to also be able to operate directly on /efi (systemd's preferred mount point for the ESP)

Another option is to use the renameat2(..., RENAME_EXCHANGE) to exchange the BLS entries directories instead of using a symbolic link, that was discussed before but was pointed out that it's not implemented by vfat.

I've proposed a patch series to add the missing support: https://lkml.org/lkml/2022/5/24/137

@cgwalters
Copy link
Member Author

I was looking at this again as part of cleaning up an "install ostree inside existing booted system" flow, and this bit is definitely important.

It seems also right now that we entirely drop non-ostree BLS entries, which is not cool at all. (Though, keeping them gets into a big mess around bootloader entry prioritization)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty/hard hard complexity/difficutly issue enhancement reward/high Fixing this will result in significant benefit triaged This issue has been evaluated and is valid
Projects
None yet
Development

No branches or pull requests

5 participants