Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builder/daemonless: pass through /dev/kvm if present #410

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jlebon
Copy link
Member

@jlebon jlebon commented Nov 12, 2024

Currently, if devices.kubevirt.io/kvm resources are requested in the
build object, the resource request makes it to the build pod, but it
doesn't really have any visible effect because the nested build process
itself doesn't have access to it.

The only reason we'd have /dev/kvm in our pod is if the user wants to
use it in their build. So just pass it through if we find it.

The use case for this is being able to build artifacts which would
normally require privileges.

One example includes base bootable container (bootc) images.
Building these currently requires privileges because it itself uses
containerization features.

In the future, this should work with user namespacing, currently in Tech
Preview. However, because we need not just uid 0 but CAP_SYS_ADMIN,
and capabilities would still be restricted by default, we would still
require access to non-default SCCs. (And of course, the builder would
also have to be adapted to pass through the capabilities.)

Another example is building disk images and shipping them in container
images. This is done for example by Kubevirt and podman-machine. Two
common ways to build disk images currently are via loopback devices or
virtualization. The former can't be used because loopback devices are
not namespaced and require privileges. This patch enables the latter.

Using virtualization enables us to build these artifacts all while using
the default OpenShift restricted SCC.

Fedora 34 is ancient and EOL. Bump to Fedora 40 so that it's closer to
the buildroot devs are using locally to build openshift-builder.

As part of the bump, we now need to also pull in netavark.
Not sure on the history there, but there are no daemonsets nowadays in
the openshift-controller-manager namespace. There's a deployment object
though and I see it cycle the pods when patching the builder pod, so I
assume that's the correct object to watch now.
Currently, if `devices.kubevirt.io/kvm` resources are requested in the
build object, the resource request makes it to the build pod, but it
doesn't really have any visible effect because the nested build process
itself doesn't have access to it.

The only reason we'd have `/dev/kvm` in our pod is if the user wants to
use it in their build. So just pass it through if we find it.

The use case for this is being able to build artifacts which would
normally require privileges.

One example includes base bootable container (bootc) images.
Building these currently requires privileges because it itself uses
containerization features.

In the future, this should work with user namespacing, currently in Tech
Preview. However, because we need not just uid 0 but `CAP_SYS_ADMIN`,
and capabilities would still be restricted by default, we would still
require access to non-default SCCs. (And of course, the builder would
also have to be adapted to pass through the capabilities.)

Another example is building disk images and shipping them in container
images. This is done for example by Kubevirt and podman-machine. Two
common ways to build disk images currently are via loopback devices or
virtualization. The former can't be used because loopback devices are
not namespaced and require privileges. This patch enables the latter.

Using virtualization enables us to build these artifacts all while using
the _default_ OpenShift restricted SCC.
Copy link
Contributor

openshift-ci bot commented Nov 12, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jlebon
Once this PR has been reviewed and has the lgtm label, please assign bparees for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jlebon
Copy link
Member Author

jlebon commented Nov 12, 2024

Just to emphasize this: doing this would have a pretty drastic impact on image mode convergence. For example, it would immediately enable us to build RHCOS in a container-native way via the OpenShift Build API (because we currently use coreos-assembler, which is already made to use virtualization for composes). E.g. this was tested to work with this patch:

ARG BUILD_CONTEXTDIR=.
FROM quay.io/coreos-assembler/coreos-assembler:latest as builder
USER root
ENV COSA_SKIP_OVERLAY=1
RUN --mount=type=bind,rw=true,src=.,dst=/src,bind-propagation=shared \
      cosa init --force /src --variant c9s && cosa fetch && cosa build ostree && \
      rm -rf cache/* tmp/* && mv -v builds/latest/$(arch)/*.ociarchive /src/out.ociarchive

FROM oci-archive:${BUILD_CONTEXTDIR}/out.ociarchive
RUN --mount=type=bind,rw=true,src=.,dst=/src,bind-propagation=shared \
      rm /src/out.ociarchive

And this in turn enables us to have better CI on openshift/os, multi-PR testing, integration in clusterbot, etc...

At the higher image mode level, the most obvious impact is on bootc-image-builder. This would allow building image mode disk images unprivileged via OpenShift by just bringing up a podman machine and doing the work there. Otherwise, for base composes, it's obviously very desirable for users to be able to build their base container images with the default OpenShift restricted SCC. The build process there currently assumes privileges but we could have tooling to make this ergonomic (in the short-term, I'm pretty sure cosa supermin-run podman build ... in a Containerfile woud work). Or alternatively, keep pursuing the move away from rpm-ostree in a way that drops any required privileges (see also https://gitlab.com/fedora/bootc/tracker/-/issues/32).

Regardless of how things are set up in the future, I think in the short to mid-term, this patch would be extremely valuable.

@jlebon
Copy link
Member Author

jlebon commented Nov 12, 2024

Sorry, going to add more here. (You can tell I'm excited about this. :)) A lot of this is context for folks I'm discussing this with and not necessarily for the maintainers of this repo directly.

How does this intersect with openshift/enhancements#1637? Well, with that enhancement, the node image becomes a layered build like any other. This means that testing changes to that node layer shouldn't require any privileges or KVM.

And I'd like also for RHCOS itself to eventually become a layered build on top of rhel-bootc. But there's a lot of work needed before that can happen. In the short/midterm, RHCOS will instead be inheriting definitions (and maybe even lockfiles) from rhel-bootc. And so it would still be a base compose that requires privileges. And suffice to say, it's extremely valuable to be able to easily test changes to that base compose as part of Prow.

@cgwalters
Copy link
Member

At the higher image mode level, the most obvious impact is on bootc-image-builder. This would allow building image mode disk images unprivileged via OpenShift by just bringing up a podman machine and doing the work there.

I am not sure I'd want to scope in podman-machine for that type of production use; there's already e.g. Kata containers and peer pods that are designed to be able to run generic workloads and shipped today.

@cgwalters
Copy link
Member

The only reason we'd have /dev/kvm in our pod is if the user wants to
use it in their build. So just pass it through if we find it.

This makes sense to me 👍

@jlebon
Copy link
Member Author

jlebon commented Nov 12, 2024

At the higher image mode level, the most obvious impact is on bootc-image-builder. This would allow building image mode disk images unprivileged via OpenShift by just bringing up a podman machine and doing the work there.

I am not sure I'd want to scope in podman-machine for that type of production use; there's already e.g. Kata containers and peer pods that are designed to be able to run generic workloads and shipped today.

Yeah, the appeal in this case is the ability to model the workload as an OpenShift Build and using all the well-known build APIs surrounding that and tooling built on top of it (like ci-operator).

Copy link
Contributor

openshift-ci bot commented Nov 12, 2024

@jlebon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/unit 5ef8a3d link true /test unit
ci/prow/e2e-aws-ovn-builds-techpreview 5ef8a3d link false /test e2e-aws-ovn-builds-techpreview
ci/prow/e2e-aws-ovn-builds 5ef8a3d link true /test e2e-aws-ovn-builds
ci/prow/security 5ef8a3d link false /test security

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jlebon
Copy link
Member Author

jlebon commented Nov 12, 2024

Hmm, CI failures don't seem to be related to the changes here. At least, I see the same failures in other PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants