-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finalizing staged deployments broken on /boot automount #2543
Comments
The dirty idea I had was to change |
This reverts commit 12d263b. On systems such as PAYG where `/boot` is an automount, `ostree-finalize-staged.service` fails to work correctly if the automount expires before shutdown. Until a solution to that issue is found, go back to the non-staged deployments we've used for years. https://phabricator.endlessm.com/T5658 systemd/systemd#22528 ostreedev/ostree#2543
I think that approach isn't dirty at all - it makes sense to me. The code is already heavily oriented towards using directory file descriptors, so we already have a natural mechanism to hold open the mounts. |
And that would actually change us from using |
Alright, I'll put something together. One question I had is, what if someone unstages the deployment? Should it watch However, you would need to change the builtin to not initially lock the sysroot since that would prevent doing anything else with it until the unit was stopped. So, I think you'd want to block on the signal, receive the signal, lock the sysroot, load it again (so the state of the deployments is up to date), and then finalize. |
If you want to handle this case, that sounds good to me, but it doesn't seem at all required to me. It's a real corner case, and we aren't going to be holding open much resident memory. And we can fix it later if someone actually does complain, so probably keep it simple to start. Plus, people should be applying kernel updates and rebooting anyways 😄 |
If `/boot` or `/sysroot` are automounts, then the unit will be stopped as soon as the automounts expire. That's would defeat the purpose of using systemd to delay finalizing the deployment until shutdown. This is not uncommon as `systemd-gpt-auto-generator` will create an automount unit for `/boot` when it's the EFI System Partition and there's no fstab entry. Instead of relying on systemd to run the command via `ExecStop` at the appropriate time, have `finalize-staged` open `/boot` and `/sysroot` and then block on `SIGTERM`. Having the directories open will prevent the automounts from expiring, and then we presume that systemd will send `SIGTERM` when it's time for the service to stop. Finalizing the deployment still happens when the service is stopped. The difference is that the process is already running. In order to keep from blocking legitimate sysroot activity prior to shutdown, the sysroot lock is only taken after the signal has been received. Similarly, the sysroot is reloaded to ensure the state of the deployments is current. Fixes: ostreedev#2543
If `/boot` or `/sysroot` are automounts, then the unit will be stopped as soon as the automounts expire. That's would defeat the purpose of using systemd to delay finalizing the deployment until shutdown. This is not uncommon as `systemd-gpt-auto-generator` will create an automount unit for `/boot` when it's the EFI System Partition and there's no fstab entry. Instead of relying on systemd to run the command via `ExecStop` at the appropriate time, have `finalize-staged` open `/boot` and `/sysroot` and then block on `SIGTERM`. Having the directories open will prevent the automounts from expiring, and then we presume that systemd will send `SIGTERM` when it's time for the service to stop. Finalizing the deployment still happens when the service is stopped. The difference is that the process is already running. In order to keep from blocking legitimate sysroot activity prior to shutdown, the sysroot lock is only taken after the signal has been received. Similarly, the sysroot is reloaded to ensure the state of the deployments is current. Fixes: ostreedev#2543
If `/boot` is an automount, then the unit will be stopped as soon as the automount expires. That's would defeat the purpose of using systemd to delay finalizing the deployment until shutdown. This is not uncommon as `systemd-gpt-auto-generator` will create an automount unit for `/boot` when it's the EFI System Partition and there's no fstab entry. Instead of relying on systemd to run the command via `ExecStop` at the appropriate time, have `finalize-staged` open `/boot` and then block on `SIGTERM`. Having the directory open will prevent the automount from expiring, and then we presume that systemd will send `SIGTERM` when it's time for the service to stop. Finalizing the deployment still happens when the service is stopped. The difference is that the process is already running. In order to keep from blocking legitimate sysroot activity prior to shutdown, the sysroot lock is only taken after the signal has been received. Similarly, the sysroot is reloaded to ensure the state of the deployments is current. Fixes: ostreedev#2543
If `/boot` is an automount, then the unit will be stopped as soon as the automount expires. That's would defeat the purpose of using systemd to delay finalizing the deployment until shutdown. This is not uncommon as `systemd-gpt-auto-generator` will create an automount unit for `/boot` when it's the EFI System Partition and there's no fstab entry. Instead of relying on systemd to run the command via `ExecStop` at the appropriate time, have `finalize-staged` open `/boot` and then block on `SIGTERM`. Having the directory open will prevent the automount from expiring, and then we presume that systemd will send `SIGTERM` when it's time for the service to stop. Finalizing the deployment still happens when the service is stopped. The difference is that the process is already running. In order to keep from blocking legitimate sysroot activity prior to shutdown, the sysroot lock is only taken after the signal has been received. Similarly, the sysroot is reloaded to ensure the state of the deployments is current. Fixes: ostreedev#2543
The ostree staged deployment process works by waiting until shutdown to swap the `/boot` symlinks to make the new deployment the default. However, when `/boot` is the EFI System Partition and there's no `fstab` entry, `systemd-gpt-auto-generator` sets up an automount so that the VFAT filesystem is only exposed when needed. Unfortunately, there are 2 bugs that make this process very fragile: * Once a systemd automount unit is scheduled to be stopped, it ignores notifications from autofs that the target filesystem should be mounted. Therefore, if `/boot` isn't mounted when shutdown begins, `ostree admin finalize-staged` will fail. See systemd/systemd#22528. * autofs is not mount namespace aware, so it will begin the expiration timer for a mount unit unless a process in the root namespace is keeping it active. Since `ostree admin finalize-staged` is run from a mount namespace (either via systemd or its own to ensure `/sysroot` and `/boot` are mounted read-write), the automount daemon (systemd) will try to unmount the filesystem if it expires during this process. See https://bugzilla.redhat.com/show_bug.cgi?id=2056090. Therefore, if `/boot` is an autofs filesystem, use a full deployment instead of a staged deployment. Since systems with an automounted `/boot` are not common, we want to retain the benefit of staged deployments for more normal systems. See ostreedev/ostree#2543 for potential future fixes in ostree. https://phabricator.endlessm.com/T33136
The ostree staged deployment process works by waiting until shutdown to swap the `/boot` symlinks to make the new deployment the default. However, when `/boot` is the EFI System Partition and there's no `fstab` entry, `systemd-gpt-auto-generator` sets up an automount so that the VFAT filesystem is only exposed when needed. Unfortunately, there are 2 bugs that make this process very fragile: * Once a systemd automount unit is scheduled to be stopped, it ignores notifications from autofs that the target filesystem should be mounted. Therefore, if `/boot` isn't mounted when shutdown begins, `ostree admin finalize-staged` will fail. See systemd/systemd#22528. * autofs is not mount namespace aware, so it will begin the expiration timer for a mount unit unless a process in the root namespace is keeping it active. Since `ostree admin finalize-staged` is run from a mount namespace (either via systemd or its own to ensure `/sysroot` and `/boot` are mounted read-write), the automount daemon (systemd) will try to unmount the filesystem if it expires during this process. See https://bugzilla.redhat.com/show_bug.cgi?id=2056090. Therefore, if `/boot` is an autofs filesystem, use a full deployment instead of a staged deployment. Since systems with an automounted `/boot` are not common, we want to retain the benefit of staged deployments for more normal systems. See ostreedev/ostree#2543 for potential future fixes in ostree. https://phabricator.endlessm.com/T33136
The ostree staged deployment process works by waiting until shutdown to swap the `/boot` symlinks to make the new deployment the default. However, when `/boot` is the EFI System Partition and there's no `fstab` entry, `systemd-gpt-auto-generator` sets up an automount so that the VFAT filesystem is only exposed when needed. Unfortunately, there are 2 bugs that make this process very fragile: * Once a systemd automount unit is scheduled to be stopped, it ignores notifications from autofs that the target filesystem should be mounted. Therefore, if `/boot` isn't mounted when shutdown begins, `ostree admin finalize-staged` will fail. See systemd/systemd#22528. * autofs is not mount namespace aware, so it will begin the expiration timer for a mount unit unless a process in the root namespace is keeping it active. Since `ostree admin finalize-staged` is run from a mount namespace (either via systemd or its own to ensure `/sysroot` and `/boot` are mounted read-write), the automount daemon (systemd) will try to unmount the filesystem if it expires during this process. See https://bugzilla.redhat.com/show_bug.cgi?id=2056090. Therefore, if `/boot` is an autofs filesystem, use a full deployment instead of a staged deployment. Since systems with an automounted `/boot` are not common, we want to retain the benefit of staged deployments for more normal systems. See ostreedev/ostree#2543 for potential future fixes in ostree. https://phabricator.endlessm.com/T33136
The ostree staged deployment process works by waiting until shutdown to swap the `/boot` symlinks to make the new deployment the default. However, when `/boot` is the EFI System Partition and there's no `fstab` entry, `systemd-gpt-auto-generator` sets up an automount so that the VFAT filesystem is only exposed when needed. Unfortunately, there are 2 bugs that make this process very fragile: * Once a systemd automount unit is scheduled to be stopped, it ignores notifications from autofs that the target filesystem should be mounted. Therefore, if `/boot` isn't mounted when shutdown begins, `ostree admin finalize-staged` will fail. See systemd/systemd#22528. * autofs is not mount namespace aware, so it will begin the expiration timer for a mount unit unless a process in the root namespace is keeping it active. Since `ostree admin finalize-staged` is run from a mount namespace (either via systemd or its own to ensure `/sysroot` and `/boot` are mounted read-write), the automount daemon (systemd) will try to unmount the filesystem if it expires during this process. See https://bugzilla.redhat.com/show_bug.cgi?id=2056090. Therefore, if `/boot` is an autofs filesystem, use a full deployment instead of a staged deployment. Since systems with an automounted `/boot` are not common, we want to retain the benefit of staged deployments for more normal systems. See ostreedev/ostree#2543 for potential future fixes in ostree. https://phabricator.endlessm.com/T33136 (cherry picked from commit a19821a)
If `/boot` is an automount, then the unit will be stopped as soon as the automount expires. That's would defeat the purpose of using systemd to delay finalizing the deployment until shutdown. This is not uncommon as `systemd-gpt-auto-generator` will create an automount unit for `/boot` when it's the EFI System Partition and there's no fstab entry. To ensure that systemd doesn't stop the service early when the `/boot` automount expires, introduce a new unit that holds `/boot` open until it's sent `SIGTERM`. This uses a new `--hold` option for `finalize-staged` that loads but doesn't lock the sysroot. A separate unit is used since we want the process to remain active throughout the finalization run in `ExecStop`. That wouldn't work if it was specified in `ExecStart` in the same unit since it would be killed before the `ExecStop` action was run. Fixes: ostreedev#2543
If `/boot` is an automount, then the unit will be stopped as soon as the automount expires. That's would defeat the purpose of using systemd to delay finalizing the deployment until shutdown. This is not uncommon as `systemd-gpt-auto-generator` will create an automount unit for `/boot` when it's the EFI System Partition and there's no fstab entry. To ensure that systemd doesn't stop the service early when the `/boot` automount expires, introduce a new unit that holds `/boot` open until it's sent `SIGTERM`. This uses a new `--hold` option for `finalize-staged` that loads but doesn't lock the sysroot. A separate unit is used since we want the process to remain active throughout the finalization run in `ExecStop`. That wouldn't work if it was specified in `ExecStart` in the same unit since it would be killed before the `ExecStop` action was run. Fixes: ostreedev#2543
If `/boot` is an automount, then the unit will be stopped as soon as the automount expires. That's would defeat the purpose of using systemd to delay finalizing the deployment until shutdown. This is not uncommon as `systemd-gpt-auto-generator` will create an automount unit for `/boot` when it's the EFI System Partition and there's no fstab entry. To ensure that systemd doesn't stop the service early when the `/boot` automount expires, introduce a new unit that holds `/boot` open until it's sent `SIGTERM`. This uses a new `--hold` option for `finalize-staged` that loads but doesn't lock the sysroot. A separate unit is used since we want the process to remain active throughout the finalization run in `ExecStop`. That wouldn't work if it was specified in `ExecStart` in the same unit since it would be killed before the `ExecStop` action was run. Fixes: ostreedev#2543
Recently we changed our updater to use staged deployments in endlessm/eos-updater#298. That worked fine on systems where
/boot
is a persistent mount point, but it fails on systems that use systemd-boot where/boot
is the automounted EFI system partition. There are 2 problems with this:/boot
automount expires, theostree-finalize-staged.service
unit runs immediately since it hasRequiresMountsFor=/boot
. With nothing keeping the automount from expiring, this can happen at any point prior to shutdown and ruin the feature. This actually deadlocks in systemd, but it would be bad even without the automounting bugs.RequiresMountsFor=/boot
is removed and instead justAfter=boot.mount
is used, then the service is only triggered on shutdown but the ordering remains. However, if the automount has expired, systemd will ignore the request to remount it since the automount is scheduled to be stopped.See systemd/systemd#22528 for details. Maybe the solution here is staged deployments are not supported on
/boot
automounts, but I wanted to open for discussion.The text was updated successfully, but these errors were encountered: