Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for passing CDI specs to --device #5443

Merged
merged 4 commits into from
Apr 1, 2024

Conversation

nalind
Copy link
Member

@nalind nalind commented Mar 28, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Add support for using CDI to resolve --device devices for RUN instructions during buildah build, buildah from, and buildah run, as podman run does.

This generally requires that we stop resolving device specifications (arguments passed to --device) earlier and deferring that until it's time to run a process, because CDI wants to pick over those values, modify a runtime spec to set up the ones that it knows about, and then hand back the list of values that it doesn't know about.

We don't want to do a dry run of this during CLI processing because that would create a window where the underlying hardware state could change, and that could produce some hard-to-diagnose errors.

Being able to test this is going to require that we add the --device flag to buildah run (--security-opt affects how we build the container's layer, so it has to be done at buildah from).

The default configured devices list is pulled in by CLI flag processing during buildah from and buildah build, so it doesn't also need to be explicitly passed to buildah run or the internal Run() method.

How to verify it

New integration tests!

Which issue(s) this PR fixes:

Fixes #5432.

Special notes for your reviewer:

It's probably easier to review this commit by commit.
chroot isolation doesn't run hooks, so a CDI configuration that includes hooks won't work correctly.
CDI: https://github.com/cncf-tags/container-device-interface
CDI for Nvidia GPUs: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html

Does this PR introduce a user-facing change?

`buildah run` now accepts a `--device` flag.
`--device` can now accept names of devices which are specified using CDI (container device interface).

When the passed-in source location is a symbolic link, dereference it,
because the documentation says that's what we do.

Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Avoid generically referring to "the container" where it can be ambiguous
that we're actually talking about the environment we set up for running
a command for a RUN instruction or Run() call.

Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
@openshift-ci openshift-ci bot added kind/feature Categorizes issue or PR as related to a new feature. approved labels Mar 28, 2024
@nalind nalind force-pushed the cdi branch 2 times, most recently from 8e22ebe to 7a46966 Compare March 28, 2024 21:53
Copy link
Collaborator

@flouthoc flouthoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR LGTM

Copy link
Contributor

openshift-ci bot commented Mar 28, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: flouthoc, nalind

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Add support for using CDI to resolve `--device` devices for RUN
instructions during `buildah build`, `buildah from`, and `buildah run`,
as `podman run` does.

This generally requires that we stop resolving device specifications
(arguments passed to --device) earlier and deferring that until it's
time to run a process, because CDI wants to pick over those values,
modify a runtime spec to set up the ones that it knows about, and then
hand back the list of values that it doesn't know about.

We don't want to do a dry run of this during CLI processing because that
would create a window where the underlying hardware state could change,
and that could produce some hard-to-diagnose errors.

Being able to test this is going to require that we add the `--device`
flag to `buildah run` (`--security-opt` affects how we build the
container's layer, so it has to be done at `buildah from`).

The default configured devices list is pulled in by CLI flag processing
during `buildah from` and `buildah build`, so it doesn't also need to be
explicitly passed to `buildah run` or the internal Run() method.

Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
Describe --device in `buildah from` and `buildah run`, where it's new.
Update the description of --device in `buildah build` to note that the
device nodes are only there while RUN instructions are being run, and
not to imply that they end up in the finished image.

Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
@rhatdan
Copy link
Member

rhatdan commented Mar 31, 2024

/lgtm
/hold

@rhatdan
Copy link
Member

rhatdan commented Apr 1, 2024

/hold cancel

@openshift-merge-bot openshift-merge-bot bot merged commit f8cdb7d into containers:main Apr 1, 2024
36 checks passed
@nalind nalind deleted the cdi branch April 1, 2024 18:25
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Jul 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved kind/feature Categorizes issue or PR as related to a new feature. lgtm locked - please file new issue/PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CDI spec devices are not valid input to --device when invoking the build command
3 participants