-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a repo option to auto-prune other deployments (e.g. rollback) when starting upgrade #2670
Comments
This would also help with an often requested feature where some folks want to keep more than 2 deployments by default. |
Isn't that already the logic for sysroot cleanup, or are you basically describing #2510? Or do you mean you don't want the temporary ballooning to 3 deployments prior to rebooting into the new deployment? It would be nice to have a config option for the number of deployments to keep, though. |
I think the intent is to allow the user to set the number of deployments to keep. So if it's set to 1, you still would temporary balloon to 2 prior rebooting. |
Yep, this. |
If I read this correctly, the number could never be less than 2. |
Ahh, I guess that makes sense. But that means the minimum number is 2 right? |
Bikeshedding, but I think I'd expect the maximum deployments to be the non-ballooned value. The ballooned number of deployments is a temporary implementation detail. As a user it would bug me that I said Also, when you're in the ballooned situation, one of the deployments is shown as staged. I think it would be reasonable to interpret In other words, I think the |
I think the idea is to have |
Right, I thought of that while after I walked away. They're actually 2 orthogonal concepts to me. Specifying the maximum number of deployments allows you to say you want no rollback deployment or more than 1 rollback deployment. Saying that you want to delete a rollback deployment before upgrading so that the number of deployments is strictly capped is slightly different. For example, does So, I really think there are 2 knobs you want:
I'd say for the RHCOS bug you want the second knob. I.e., don't worry about changing the number of deployments right now, but allow systems to opt in to aggressively pruning the rollback deployment before upgrading to keep disk space constrained. |
Yeah fair. I think they're strongly related, but yes, viewing them orthogonally makes sense too. Perhaps we start with just But...there are corner cases here, specifically: what happens if there are more than 2 deployments (do we only remove 1?) - this is really the grey area between (Also, another corner cases is "what happens if the rollback is pinned?" but I think we should probably silently have the pin win) |
Good points. I think if there are pinned deployments, they should just be ignored for the purposes of pruning deployments. For more than 2 non-pinned deployments, I think they should all be removed. Basically, the same thing |
One risk with this approach worth highlighting is that a regression in the upgrade path where something fails after the rollback cleanup could result in the host with no means of going back to a deployment with working upgrade code. (E.g. failing to merge |
How so? If the current deployment works. You got the one you are executing the upgrade from "safe" from deleting. |
If you update from A into a broken code base B that is not capable of doing updates past the cleanup stage, then once you remove A to make room for C with the B code, and then the update fails, you are stuck with B with no rollback option. |
Right. This is why we have upgrade tests. Container Linux was especially susceptible to this with its A/B partition update scheme. If an update bug happened before the secondary partition was nuked, you could just rollback. But if it happened after, you'd be stuck (see e.g. coreos/bugs#2457 (comment)). libostree is better in this regard by only cleaning up the rollback after most fallible operations were done. This would re-introduce some of that fallibility. The tradeoff might be worth it though in scenarios where reprovisioning is easier such as clusters. |
OK I'd like to propose that this option is:
Basically we only do the cleanup if doing so would allow us to install a kernel/initramfs when we otherwise couldn't. |
This sounds ideal... Almost like it should be the default, though? We only take this (slightly more risky) code path if we couldn't succeed otherwise (not enough space). |
Another thing we could do to mitigate risk is to move the old kernel/initrd to a different filesystem (tmpfs or any kind of tmp) rather than deleting it. Upon failure we could attempt to restore the original files back. I think it would be nice to make progress on this sooner than later as it appears the compression mitigation might not be enough for |
I briefly looked at this, it's quite messy due to the internal design of trying to do a "transactional swap" of the deployments - we end up needing something like a "pre-pass". Or maybe the higher level code can pass down a separate "list of deployments to keep if you can". Needs a bit of thought/design. |
Just curious @cgwalters @jmarrero what are the options for the usecase today where you just want to store 2 rollbacks or 3, etc. ? |
Is the complexity in handling this arising from trying to handle |
Yeah, agree that's easier. |
I just hit a similar problem here in rawhide on an After reboot I see the update didn't apply and:
|
During the early design of FCOS and RHCOS, we chose a value of 384M for the boot partition. This turned out to be too small: some arches other than x86_64 have larger initrds, kernel binaries, or additional artifacts (like device tree blobs). We'll likely bump the boot partition size in the future, but we don't want to abandon all the nodes deployed with the current size.[[1]] Because stale entries in `/boot` are cleaned up after new entries are written, there is a window in the update process during which the bootfs temporarily must host all the `(kernel, initrd)` pairs for the union of current and new deployments. This patch determines if the bootfs is capable of holding all the pairs. If it can't but it could hold all the pairs from just the new deployments, the outgoing deployments (e.g. rollbacks) are deleted *before* new deployments are written. This is done by updating the bootloader in two steps to maintain atomicity. Since this is a lot of new logic in an important section of the code, this feature is gated for now behind an environment variable (`OSTREE_EXP_AUTO_EARLY_PRUNE`). Once we gain more experience with it, we can consider turning it on by default. This strategy increases the fallibility of the update system since one would no longer be able to rollback to the previous deployment if a bug is present in the bootloader update logic after auto-pruning. This is however mitigated by the fact that the heuristic is opportunistic: the rollback is pruned *only if* it's the only way for the system to update. [1]: coreos/fedora-coreos-tracker#1247 Closes: ostreedev#2670
During the early design of FCOS and RHCOS, we chose a value of 384M for the boot partition. This turned out to be too small: some arches other than x86_64 have larger initrds, kernel binaries, or additional artifacts (like device tree blobs). We'll likely bump the boot partition size in the future, but we don't want to abandon all the nodes deployed with the current size.[[1]] Because stale entries in `/boot` are cleaned up after new entries are written, there is a window in the update process during which the bootfs temporarily must host all the `(kernel, initrd)` pairs for the union of current and new deployments. This patch determines if the bootfs is capable of holding all the pairs. If it can't but it could hold all the pairs from just the new deployments, the outgoing deployments (e.g. rollbacks) are deleted *before* new deployments are written. This is done by updating the bootloader in two steps to maintain atomicity. Since this is a lot of new logic in an important section of the code, this feature is gated for now behind an environment variable (`OSTREE_EXP_AUTO_EARLY_PRUNE`). Once we gain more experience with it, we can consider turning it on by default. This strategy increases the fallibility of the update system since one would no longer be able to rollback to the previous deployment if a bug is present in the bootloader update logic after auto-pruning. This is however mitigated by the fact that the heuristic is opportunistic: the rollback is pruned *only if* it's the only way for the system to update. [1]: coreos/fedora-coreos-tracker#1247 Closes: ostreedev#2670
During the early design of FCOS and RHCOS, we chose a value of 384M for the boot partition. This turned out to be too small: some arches other than x86_64 have larger initrds, kernel binaries, or additional artifacts (like device tree blobs). We'll likely bump the boot partition size in the future, but we don't want to abandon all the nodes deployed with the current size.[[1]] Because stale entries in `/boot` are cleaned up after new entries are written, there is a window in the update process during which the bootfs temporarily must host all the `(kernel, initrd)` pairs for the union of current and new deployments. This patch determines if the bootfs is capable of holding all the pairs. If it can't but it could hold all the pairs from just the new deployments, the outgoing deployments (e.g. rollbacks) are deleted *before* new deployments are written. This is done by updating the bootloader in two steps to maintain atomicity. Since this is a lot of new logic in an important section of the code, this feature is gated for now behind an environment variable (`OSTREE_EXP_AUTO_EARLY_PRUNE`). Once we gain more experience with it, we can consider turning it on by default. This strategy increases the fallibility of the update system since one would no longer be able to rollback to the previous deployment if a bug is present in the bootloader update logic after auto-pruning. This is however mitigated by the fact that the heuristic is opportunistic: the rollback is pruned *only if* it's the only way for the system to update. [1]: coreos/fedora-coreos-tracker#1247 Closes: ostreedev#2670
During the early design of FCOS and RHCOS, we chose a value of 384M for the boot partition. This turned out to be too small: some arches other than x86_64 have larger initrds, kernel binaries, or additional artifacts (like device tree blobs). We'll likely bump the boot partition size in the future, but we don't want to abandon all the nodes deployed with the current size.[[1]] Because stale entries in `/boot` are cleaned up after new entries are written, there is a window in the update process during which the bootfs temporarily must host all the `(kernel, initrd)` pairs for the union of current and new deployments. This patch determines if the bootfs is capable of holding all the pairs. If it can't but it could hold all the pairs from just the new deployments, the outgoing deployments (e.g. rollbacks) are deleted *before* new deployments are written. This is done by updating the bootloader in two steps to maintain atomicity. Since this is a lot of new logic in an important section of the code, this feature is gated for now behind an environment variable (`OSTREE_EXP_AUTO_EARLY_PRUNE`). Once we gain more experience with it, we can consider turning it on by default. This strategy increases the fallibility of the update system since one would no longer be able to rollback to the previous deployment if a bug is present in the bootloader update logic after auto-pruning. This is however mitigated by the fact that the heuristic is opportunistic: the rollback is pruned *only if* it's the only way for the system to update. [1]: coreos/fedora-coreos-tracker#1247 Closes: ostreedev#2670
During the early design of FCOS and RHCOS, we chose a value of 384M for the boot partition. This turned out to be too small: some arches other than x86_64 have larger initrds, kernel binaries, or additional artifacts (like device tree blobs). We'll likely bump the boot partition size in the future, but we don't want to abandon all the nodes deployed with the current size.[[1]] Because stale entries in `/boot` are cleaned up after new entries are written, there is a window in the update process during which the bootfs temporarily must host all the `(kernel, initrd)` pairs for the union of current and new deployments. This patch determines if the bootfs is capable of holding all the pairs. If it can't but it could hold all the pairs from just the new deployments, the outgoing deployments (e.g. rollbacks) are deleted *before* new deployments are written. This is done by updating the bootloader in two steps to maintain atomicity. Since this is a lot of new logic in an important section of the code, this feature is gated for now behind an environment variable (`OSTREE_EXP_AUTO_EARLY_PRUNE`). Once we gain more experience with it, we can consider turning it on by default. This strategy increases the fallibility of the update system since one would no longer be able to rollback to the previous deployment if a bug is present in the bootloader update logic after auto-pruning. This is however mitigated by the fact that the heuristic is opportunistic: the rollback is pruned *only if* it's the only way for the system to update. [1]: coreos/fedora-coreos-tracker#1247 Closes: ostreedev#2670
During the early design of FCOS and RHCOS, we chose a value of 384M for the boot partition. This turned out to be too small: some arches other than x86_64 have larger initrds, kernel binaries, or additional artifacts (like device tree blobs). We'll likely bump the boot partition size in the future, but we don't want to abandon all the nodes deployed with the current size.[[1]] Because stale entries in `/boot` are cleaned up after new entries are written, there is a window in the update process during which the bootfs temporarily must host all the `(kernel, initrd)` pairs for the union of current and new deployments. This patch determines if the bootfs is capable of holding all the pairs. If it can't but it could hold all the pairs from just the new deployments, the outgoing deployments (e.g. rollbacks) are deleted *before* new deployments are written. This is done by updating the bootloader in two steps to maintain atomicity. Since this is a lot of new logic in an important section of the code, this feature is gated for now behind an environment variable (`OSTREE_ENABLE_AUTO_EARLY_PRUNE`). Once we gain more experience with it, we can consider turning it on by default. This strategy increases the fallibility of the update system since one would no longer be able to rollback to the previous deployment if a bug is present in the bootloader update logic after auto-pruning (see [[2]] and following). This is however mitigated by the fact that the heuristic is opportunistic: the rollback is pruned *only if* it's the only way for the system to update. [1]: coreos/fedora-coreos-tracker#1247 [2]: ostreedev#2670 (comment) Closes: ostreedev#2670
During the early design of FCOS and RHCOS, we chose a value of 384M for the boot partition. This turned out to be too small: some arches other than x86_64 have larger initrds, kernel binaries, or additional artifacts (like device tree blobs). We'll likely bump the boot partition size in the future, but we don't want to abandon all the nodes deployed with the current size.[[1]] Because stale entries in `/boot` are cleaned up after new entries are written, there is a window in the update process during which the bootfs temporarily must host all the `(kernel, initrd)` pairs for the union of current and new deployments. This patch determines if the bootfs is capable of holding all the pairs. If it can't but it could hold all the pairs from just the new deployments, the outgoing deployments (e.g. rollbacks) are deleted *before* new deployments are written. This is done by updating the bootloader in two steps to maintain atomicity. Since this is a lot of new logic in an important section of the code, this feature is gated for now behind an environment variable (`OSTREE_ENABLE_AUTO_EARLY_PRUNE`). Once we gain more experience with it, we can consider turning it on by default. This strategy increases the fallibility of the update system since one would no longer be able to rollback to the previous deployment if a bug is present in the bootloader update logic after auto-pruning (see [[2]] and following). This is however mitigated by the fact that the heuristic is opportunistic: the rollback is pruned *only if* it's the only way for the system to update. [1]: coreos/fedora-coreos-tracker#1247 [2]: ostreedev#2670 (comment) Closes: ostreedev#2670
In RHCOS we're running up against space constraints https://bugzilla.redhat.com/show_bug.cgi?id=2104619
I think we should support something like
This would tell ostree to auto-prune the rollback deployment (and others) when starting an upgrade.
The text was updated successfully, but these errors were encountered: