-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unified define k8s-driver-manager image info in values.yaml #1032
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
lengrongfu
requested review from
ArangoGutierrez,
cdesiniotis,
elezar,
shivamerla and
tariq1890
as code owners
October 11, 2024 06:57
tariq1890
reviewed
Nov 22, 2024
Hi @lengrongfu , thanks for your contribution! Can you rebase this PR? |
lengrongfu
force-pushed
the
feat/unified-version
branch
from
November 23, 2024 03:27
3db2bfa
to
5a06912
Compare
lengrongfu
force-pushed
the
feat/unified-version
branch
from
November 23, 2024 03:29
5a06912
to
6890ac9
Compare
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
…e to Role Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 5 to 6. - [Release notes](https://github.com/golangci/golangci-lint-action/releases) - [Commits](golangci/golangci-lint-action@v5...v6) --- updated-dependencies: - dependency-name: golangci/golangci-lint-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps nvidia/cuda from 12.4.1-base-ubi8 to 12.5.0-base-ubi8. --- updated-dependencies: - dependency-name: nvidia/cuda dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps nvidia/cuda from 12.4.1-base-ubi8 to 12.5.0-base-ubi8. --- updated-dependencies: - dependency-name: nvidia/cuda dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
…ripts This commit updates the driver validation to always create a 'driver-ready' file, regardless if the driver is installed on the host or not. It also populates this file with a list of environment variables, one per line, which are required by some operands. The startup scripts for several operands are simplified to simply source the content of this file before executing the main program for the container. Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
…naged by GPU Operator The commit updates our driver validator to only check the presence of .driver-ctr.ready, a file created by our driver daemonset readiness probe, if the driver container is managed by GPU Operator. This allows us to support non-standard environments, like GKE, where a driver container is deployed but not managed by GPU Operator. Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
This commit updates our driver validator code to only chroot when validating a host installed driver. When validating a driver container install, we discover the paths to libnvidia-ml.so.1 and nvidia-smi at the driver container root and then run 'LD_PRELOAD=/driverRoot/path/to/libnvidia-ml.so.1 nvidia-smi'. This sets the stage for validating driver container installs where driverRoot does not represent a full filesystem hiearchy that one can 'chroot' into. Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
RootFS represents the path to the root filesystem of the host. This is used by components that need to interact with the host filesystem and as such this must be a chroot-able filesystem. Examples include the MIG Manager and Toolkit Container which may need to stop, start, or restart systemd services. Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com> Co-authored-by: Angelos Kolaitis <neoaggelos@gmail.com>
…and containers except the driver-validator Having a static path inside our containers will make it easier when driverRoot is a configurable field. If driverRoot is set to a custom path, we can transform the host path for the volume while keeping the container path unchanged. The driver-validation initContainer is the exception to this rule. From the driver validation initContainer, the container path must match the host path otherwise the /dev/char symlinks will not resolve correctly on the host. The target of the symlinks must correspond to the path of the device nodes on the host. For example, when the NVIDIA device nodes are present under `/run/nvidia/driver/dev` on the host, running the following command from inside the container would create an invalid symlink: ln -s /driver-root/dev/nvidiactl /host-dev-char/195:255 while running the below command from inside the container would create a valid symlink: ln -s /run/nvidia/driver/dev/nvidiactl /host-dev-char/195:255 Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
This allows for non-standard driver container installations, where the driver installation path and device nodes are rooted at paths other than '/run/nvidia/driver'. Note, setting driverInstallDir to a custom value is currently only supported for driver container installations not managed by by GPU Operator. For example, in the GKE use case where a driver daemonset is deployed prior to installing GPU Operator and the GPU Operator managed driver is disabled. The GPU Operator's driver container daemonset still assumes that the full driver installation is made available at '/run/nvidia/driver' on the host, and consequently, we always mount '/run/nvidia/driver' into the GPU Operator managed daemonset. We may consider removing this assumption in the future and support driver container implementations which allow for a custom driverInstallDir to be specified. Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
This reverts commit 0efdd6d.
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
…toring Bumps [github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring](https://github.com/prometheus-operator/prometheus-operator) from 0.76.2 to 0.78.1. - [Release notes](https://github.com/prometheus-operator/prometheus-operator/releases) - [Changelog](https://github.com/prometheus-operator/prometheus-operator/blob/main/CHANGELOG.md) - [Commits](prometheus-operator/prometheus-operator@v0.76.2...v0.78.1) --- updated-dependencies: - dependency-name: github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Bumps [github.com/NVIDIA/nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) from 1.17.0 to 1.17.2. - [Release notes](https://github.com/NVIDIA/nvidia-container-toolkit/releases) - [Changelog](https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.17.2/CHANGELOG.md) - [Commits](NVIDIA/nvidia-container-toolkit@v1.17.0...v1.17.2) --- updated-dependencies: - dependency-name: github.com/NVIDIA/nvidia-container-toolkit dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Bumps [github.com/onsi/gomega](https://github.com/onsi/gomega) from 1.35.0 to 1.35.1. - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.35.0...v1.35.1) --- updated-dependencies: - dependency-name: github.com/onsi/gomega dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.20.4 to 1.20.5. - [Release notes](https://github.com/prometheus/client_golang/releases) - [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md) - [Commits](prometheus/client_golang@v1.20.4...v1.20.5) --- updated-dependencies: - dependency-name: github.com/prometheus/client_golang dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.21.0 to 0.22.0. - [Commits](golang/mod@v0.21.0...v0.22.0) --- updated-dependencies: - dependency-name: golang.org/x/mod dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [github.com/regclient/regclient](https://github.com/regclient/regclient) from 0.7.1 to 0.7.2. - [Release notes](https://github.com/regclient/regclient/releases) - [Changelog](https://github.com/regclient/regclient/blob/v0.7.2/release.md) - [Commits](regclient/regclient@v0.7.1...v0.7.2) --- updated-dependencies: - dependency-name: github.com/regclient/regclient dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
…B_ENV Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Bumps [sigs.k8s.io/controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) from 0.19.0 to 0.19.1. - [Release notes](https://github.com/kubernetes-sigs/controller-runtime/releases) - [Changelog](https://github.com/kubernetes-sigs/controller-runtime/blob/main/RELEASE.md) - [Commits](kubernetes-sigs/controller-runtime@v0.19.0...v0.19.1) --- updated-dependencies: - dependency-name: sigs.k8s.io/controller-runtime dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [sigs.k8s.io/controller-tools](https://github.com/kubernetes-sigs/controller-tools) from 0.16.4 to 0.16.5. - [Release notes](https://github.com/kubernetes-sigs/controller-tools/releases) - [Changelog](https://github.com/kubernetes-sigs/controller-tools/blob/main/envtest-releases.yaml) - [Commits](kubernetes-sigs/controller-tools@v0.16.4...v0.16.5) --- updated-dependencies: - dependency-name: sigs.k8s.io/controller-tools dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Bumps [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) from 2.21.0 to 2.22.0. - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.21.0...v2.22.0) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Signed-off-by: rongfu.leng <lenronfu@gmail.com>
lengrongfu
force-pushed
the
feat/unified-version
branch
from
December 2, 2024 14:19
6890ac9
to
b93f0b3
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes: #642