-
Notifications
You must be signed in to change notification settings - Fork 14
mount: substantially rework mounting code #82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
d7dfe7a to
b651f81
Compare
1579b6d to
1e2df9f
Compare
...and an O_DIRECTORY for good measure. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Firm up our inline alignment rules a bit further: not only does all
inline data need to land in the same filesystem block, but it also needs
to land in the same block with at least 1 byte of the inode+xattr data.
Despite not being flagged by modern versions of fsck.erofs (which we run
all of our images through), this is required to work around a weird
quirk with symlink handling in older kernel versions where we get errors
like:
erofs_fill_symlink: inline data cross block boundary @ nid 77438
and EUCLEAN ("Structure needs cleaning") errors returned to userspace.
We'll start testing against those older kernel versions soon when we add
support for RHEL 9.
Fixes #85
Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Instead of selecting the Containerfile by way of passing a `-f` flag to the various examples/*/build scripts, make those scripts accept an OS name as their only parameter. Use this to select the Containerfile and also customize the name of the image according to it. Update the test runner and GitHub Actions workflows accordingly. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Update the unified-secureboot example to work with the Cockpit bots libraries in the same way as the other examples do. Add it to CI. When run inside of CI we don't actually enable secure boot (because Cockpit bots doesn't support it) but it will still be checked when we use the ./run script. Now that this example is the only one still using that script, update it for serial console support and move it from the examples/common to the unified-secureboot directory. Fixes #83 Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
We now use the system mount APIs in a more modern way: the filesystem tree is now assembled purely from file descriptors and mounted in place only after it's complete, resulting in more readable code. This depends on a very new kernel: the merge window on 6.15 isn't closed yet, but we already depend on many of the feature of the mount API that got added in this release. In particular, we make use of these features, with fallbacks: - the ability to use a detached mount as an overlayfs layer - the ability to supply O_PATH fds as overlayfs layers At the same time, we preserve backwards compatibility to older kernels via a compatibility layer which remains mostly isolated in a separate file. We even add support for creating loopback devices for compatibility with RHEL 9 (which can't mount erofs from a file). The inclusion of the compatibility code is controlled by the feature flags `pre-6.15` and `rhel9` (which implies `pre-6.15`). When running on pre-6.15 kernels (which is enabled by default), very little has actually changed in terms of what's happening at the syscall level. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
1e2df9f to
bc5e625
Compare
Johan-Liebert1
requested changes
Apr 2, 2025
Take a copy of the existing composefs-pivot-sysroot command and substantially rewrite it. The new command understands ostree's /usr/lib/ostree/prepare-root.conf file (transient overlays for /etc and /, also adding support /var). In addition it's possible to specify that /etc and /var are one of: - none: no mount, will be readonly contents of composefs at runtime - bind: straight bind-mount from the state directory - overlay: state directory contains the upperdir of an overlay - transient: alias for transient=true (ie: overlay with tmpfs) This follows the /sysroot/state/ layout discussed in #38. The default for /etc is 'overlay' and the default for /var is 'bind'. In general the new command focuses less on absolute minimalism: we now have proper commandline parsing and our config file is parsed as toml via serde. This makes the command (which gets included in the initramfs) a fair bit bigger: it's 1.2MB now (but compresses to about half that). We can deal with that later if it's really a problem, though. We also embrace the "build the tree purely from fds" approach from the previous commit and even mount the various subdirectories (var, etc.) into the still-detached composefs mount object. This also depends on a new feature in 6.15 (mounting into detached mounts) which means we need to adjust the order of operations for older kernels (to mount the new root directory first) but this is conditionalized with 'pre-6.15'. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Move the examples to make use of the new `composefs-setup-root` command. This is mostly an application of renaming and `sed` but we also have to adjust the directories created on the resulting system image: they need to have a `state/` directory with a `etc/` and `var/` present inside of the deployment. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Remove the ssh-key generation at build time from all of the examples: /etc overlay support is working now and all of the images will generate their ssh keys at first boot, so we no longer need this cludge. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Move the examples to set the feature flags explicitly, depending on which OS image is requested. Right now this is rather boring: all of our images have a pre-6.15 kernel. This is groundwork for the next commit which will introduce two new images. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Add two new OS images to CI: - `rhel9`: to test that feature flag `rhel9` is working correctly - `rawhide`: with a 6.15 pre-release kernel, requiring no features This helps round out the verification that our new mount code behaves correctly in all of the situations that it claims to. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Nothing is using this anymore, so we can drop it. Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
This script can be run under `unshare -Umr` to test that composefs-setup-root is working properly, without building a VM image. This helped a lot with manual testing during the original development of composefs-setup-root and will be expanded in the future to include more scenarios (different types of /etc and /var mounting, transient overlays, and so on). Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Make a minor fix to the build scripts to quote the pwd in case it has a
space in it. While we're at it, use (POSIX specified) ${PWD} instead of
potentially shelling out to $(pwd).
Thanks to Pragyan for the suggestion.
Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
bc5e625 to
f5045e1
Compare
|
@Johan-Liebert1 thanks for the suggestions. I've addressed ~2.5 of them. I'll leave you to resolve the others if you're satisfied. |
Johan-Liebert1
approved these changes
Apr 2, 2025
|
notes from meeting with Colin:
|
|
I'm going to merge this and address Colin's points in followups. |
This was referenced Apr 2, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This represents a substantial rewrite of the mounting code. composefs-pivot-sysroot is renamed to composefs-prepare-root and now understands ostree's /usr/lib/ostree/prepare-root.conf file (supporting transient overlays for /etc and /, also adding support /var). In addition it's possible to specify that /etc and /var are one of:
This follows the /sysroot/state/ layout discussed in #38.
The default for /etc is 'overlay' and the default for /var is 'bind'.
In general the new command focuses less on absolute minimalism: we now have proper commandline parsing and our config file is parsed as toml via serde. This makes the command (which gets included in the initramfs) a fair bit bigger: it's 1.2MB now (but compresses to about half that). We can deal with that later if it's really a problem, though.
We now use the system mount APIs in a more modern way: the filesystem tree is now assembled purely from file descriptors and mounted in place only after it's complete, resulting in very readable code. This depends on a very new kernel: the merge window on 6.15 isn't closed yet, but we already depend on many of the feature of the mount API that got added in this release. Fortunately, rawhide already has a pre-release version that we can test against: add a new integration test based on it.
At the same time, we preserve backwards compatibility to older kernels via a compatibility layer which remains mostly isolated in a separate file. We even add compatibility with RHEL 9 (and add another integration test for that). The inclusion of the compatibility code is controlled by the feature flags
pre-6.15andrhel9(which impliespre-6.15).Rework the examples a bit to add more explicit support for separate OSes which are now accepted as the $1 parameter to each build script: the OS parameter now controls the Containerfile used as well as the build features.
Also remove the ssh-key generation at build time from all of the examples: /etc overlay support is working now and all of the images will generate their ssh keys at first boot, so we no longer need this cludge.
Add the start of a new integration test which can run unprivileged on the host system inside of a fresh namespace.