Skip to content

Conversation

@allisonkarlitskaya
Copy link
Collaborator

This represents a substantial rewrite of the mounting code. composefs-pivot-sysroot is renamed to composefs-prepare-root and now understands ostree's /usr/lib/ostree/prepare-root.conf file (supporting transient overlays for /etc and /, also adding support /var). In addition it's possible to specify that /etc and /var are one of:

  • none: no mount, will be readonly contents of composefs at runtime
  • bind: straight bind-mount from the state directory
  • overlay: state directory contains the upperdir of an overlay
  • transient: alias for transient=true (ie: overlay with tmpfs)

This follows the /sysroot/state/ layout discussed in #38.

The default for /etc is 'overlay' and the default for /var is 'bind'.

In general the new command focuses less on absolute minimalism: we now have proper commandline parsing and our config file is parsed as toml via serde. This makes the command (which gets included in the initramfs) a fair bit bigger: it's 1.2MB now (but compresses to about half that). We can deal with that later if it's really a problem, though.

We now use the system mount APIs in a more modern way: the filesystem tree is now assembled purely from file descriptors and mounted in place only after it's complete, resulting in very readable code. This depends on a very new kernel: the merge window on 6.15 isn't closed yet, but we already depend on many of the feature of the mount API that got added in this release. Fortunately, rawhide already has a pre-release version that we can test against: add a new integration test based on it.

At the same time, we preserve backwards compatibility to older kernels via a compatibility layer which remains mostly isolated in a separate file. We even add compatibility with RHEL 9 (and add another integration test for that). The inclusion of the compatibility code is controlled by the feature flags pre-6.15 and rhel9 (which implies pre-6.15).

Rework the examples a bit to add more explicit support for separate OSes which are now accepted as the $1 parameter to each build script: the OS parameter now controls the Containerfile used as well as the build features.

Also remove the ssh-key generation at build time from all of the examples: /etc overlay support is working now and all of the images will generate their ssh keys at first boot, so we no longer need this cludge.

Add the start of a new integration test which can run unprivileged on the host system inside of a fresh namespace.

@allisonkarlitskaya allisonkarlitskaya force-pushed the setup-root branch 6 times, most recently from d7dfe7a to b651f81 Compare April 1, 2025 09:19
@allisonkarlitskaya allisonkarlitskaya force-pushed the setup-root branch 7 times, most recently from 1579b6d to 1e2df9f Compare April 1, 2025 20:11
...and an O_DIRECTORY for good measure.

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Firm up our inline alignment rules a bit further: not only does all
inline data need to land in the same filesystem block, but it also needs
to land in the same block with at least 1 byte of the inode+xattr data.

Despite not being flagged by modern versions of fsck.erofs (which we run
all of our images through), this is required to work around a weird
quirk with symlink handling in older kernel versions where we get errors
like:

  erofs_fill_symlink: inline data cross block boundary @ nid 77438

and EUCLEAN ("Structure needs cleaning") errors returned to userspace.

We'll start testing against those older kernel versions soon when we add
support for RHEL 9.

Fixes #85

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Instead of selecting the Containerfile by way of passing a `-f` flag to
the various examples/*/build scripts, make those scripts accept an OS
name as their only parameter.  Use this to select the Containerfile and
also customize the name of the image according to it.

Update the test runner and GitHub Actions workflows accordingly.

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Update the unified-secureboot example to work with the Cockpit bots
libraries in the same way as the other examples do.  Add it to CI.

When run inside of CI we don't actually enable secure boot (because
Cockpit bots doesn't support it) but it will still be checked when we
use the ./run script.  Now that this example is the only one still using
that script, update it for serial console support and move it from the
examples/common to the unified-secureboot directory.

Fixes #83

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
We now use the system mount APIs in a more modern way: the filesystem
tree is now assembled purely from file descriptors and mounted in place
only after it's complete, resulting in more readable code.  This depends
on a very new kernel: the merge window on 6.15 isn't closed yet, but we
already depend on many of the feature of the mount API that got added in
this release.

In particular, we make use of these features, with fallbacks:
 - the ability to use a detached mount as an overlayfs layer
 - the ability to supply O_PATH fds as overlayfs layers

At the same time, we preserve backwards compatibility to older kernels
via a compatibility layer which remains mostly isolated in a separate
file.  We even add support for creating loopback devices for
compatibility with RHEL 9 (which can't mount erofs from a file).  The
inclusion of the compatibility code is controlled by the feature flags
`pre-6.15` and `rhel9` (which implies `pre-6.15`).

When running on pre-6.15 kernels (which is enabled by default), very
little has actually changed in terms of what's happening at the syscall
level.

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
@allisonkarlitskaya allisonkarlitskaya marked this pull request as ready for review April 1, 2025 20:54
Take a copy of the existing composefs-pivot-sysroot command and
substantially rewrite it.

The new command understands ostree's /usr/lib/ostree/prepare-root.conf
file (transient overlays for /etc and /, also adding support /var).  In
addition it's possible to specify that /etc and /var are one of:
  - none: no mount, will be readonly contents of composefs at runtime
  - bind: straight bind-mount from the state directory
  - overlay: state directory contains the upperdir of an overlay
  - transient: alias for transient=true (ie: overlay with tmpfs)

This follows the /sysroot/state/ layout discussed in #38.

The default for /etc is 'overlay' and the default for /var is 'bind'.

In general the new command focuses less on absolute minimalism: we now
have proper commandline parsing and our config file is parsed as toml
via serde.  This makes the command (which gets included in the
initramfs) a fair bit bigger: it's 1.2MB now (but compresses to about
half that).  We can deal with that later if it's really a problem,
though.

We also embrace the "build the tree purely from fds" approach from the
previous commit and even mount the various subdirectories (var, etc.)
into the still-detached composefs mount object.  This also depends on a
new feature in 6.15 (mounting into detached mounts) which means we need
to adjust the order of operations for older kernels (to mount the new
root directory first) but this is conditionalized with 'pre-6.15'.

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Move the examples to make use of the new `composefs-setup-root` command.
This is mostly an application of renaming and `sed` but we also have to
adjust the directories created on the resulting system image: they need
to have a `state/` directory with a `etc/` and `var/` present inside of
the deployment.

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Remove the ssh-key generation at build time from all of the examples:
/etc overlay support is working now and all of the images will generate
their ssh keys at first boot, so we no longer need this cludge.

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Move the examples to set the feature flags explicitly, depending on
which OS image is requested.  Right now this is rather boring: all of
our images have a pre-6.15 kernel.  This is groundwork for the next
commit which will introduce two new images.

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Add two new OS images to CI:

 - `rhel9`: to test that  feature flag `rhel9` is working correctly
 - `rawhide`: with a 6.15 pre-release kernel, requiring no features

This helps round out the verification that our new mount code behaves
correctly in all of the situations that it claims to.

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Nothing is using this anymore, so we can drop it.

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
This script can be run under `unshare -Umr` to test that
composefs-setup-root is working properly, without building a VM image.

This helped a lot with manual testing during the original development of
composefs-setup-root and will be expanded in the future to include more
scenarios (different types of /etc and /var mounting, transient
overlays, and so on).

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Make a minor fix to the build scripts to quote the pwd in case it has a
space in it.  While we're at it, use (POSIX specified) ${PWD} instead of
potentially shelling out to $(pwd).

Thanks to Pragyan for the suggestion.

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
@allisonkarlitskaya
Copy link
Collaborator Author

@Johan-Liebert1 thanks for the suggestions. I've addressed ~2.5 of them. I'll leave you to resolve the others if you're satisfied.

@allisonkarlitskaya
Copy link
Collaborator Author

notes from meeting with Colin:

  • let's do a new config file for composefs and make it toml
  • he uses "tini" in bootc for ini parsing
  • we could also transform the ini file for ostree at the time we build the initramfs (ie: as part of the dracut module) to make sure it's toml
  • need to think about systems without fs-verity

@allisonkarlitskaya
Copy link
Collaborator Author

I'm going to merge this and address Colin's points in followups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants