Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unified define k8s-driver-manager image info in values.yaml #1032

Closed
wants to merge 293 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
293 commits
Select commit Hold shift + click to select a range
88afd72
Rename master to main
elezar Jun 6, 2024
1541fee
[OCP] add back permissions to set finalizers on CRDs
tariq1890 Jun 7, 2024
10d5b62
[CI] specify crictl version in holodeck yaml config
tariq1890 Jun 6, 2024
bb44552
[OCP] restrict RBAC perms of gpu-operator in OLM bundle
tariq1890 Jun 10, 2024
fac2493
Add H20 to default mig-manager config
cdesiniotis Jun 6, 2024
d13e33d
[CC Mgr VGPU Device Mgr] move pods access permissions from ClusterRol…
tariq1890 Jun 11, 2024
e461bb4
bump gpu drivers to 470.256.02, 535.183.01 and 550.90.07
tariq1890 Jun 12, 2024
c98df67
Add dependabot config
cdesiniotis Jun 14, 2024
30997d0
Add dependabot rule for updating CUDA base image
cdesiniotis Jun 14, 2024
97cb422
Bump golangci/golangci-lint-action from 5 to 6
dependabot[bot] Jun 14, 2024
5dc316e
Bump nvidia/cuda from 12.4.1-base-ubi8 to 12.5.0-base-ubi8 in /docker
dependabot[bot] Jun 14, 2024
4173294
Bump nvidia/cuda from 12.4.1-base-ubi8 to 12.5.0-base-ubi8 in /validator
dependabot[bot] Jun 14, 2024
27c25c4
Bump CUDA base image used by operands to 12.5.0
cdesiniotis Jun 17, 2024
b2da9e0
[RBAC] move namespace-scoped resource permissions to Roles
tariq1890 Jun 11, 2024
288e80b
Create driver-ready file with content and simplify operand startup sc…
cdesiniotis May 7, 2024
3c46971
Add logic to the driver validator to support driver containers not ma…
cdesiniotis May 7, 2024
cecb925
Only chroot when validating a host installed driver
cdesiniotis May 8, 2024
b479c22
Add a 'hostPaths.rootFS' field to ClusterPolicy
cdesiniotis May 9, 2024
43f2b88
Always mount a driver container install to '/driver-root' in our oper…
cdesiniotis May 9, 2024
8b27e96
Add 'hostPaths.driverInstallDir' field to ClusterPolicy
cdesiniotis May 10, 2024
f452888
Bump NVIDIA Container Toolkit to v1.16.0-rc.1
cdesiniotis Jun 18, 2024
779a3c4
Bump k8s-device-plugin to v0.16.0-rc.1
cdesiniotis Jun 18, 2024
e8617aa
[operator-validator] remove redundant daemonset kube get calls
tariq1890 Jun 18, 2024
ce8de9d
add ngc signing job for auto signing
shivakunv Jun 6, 2024
2ceb86f
Fix bug in validator/metrics.go for driver validation
cdesiniotis Jun 24, 2024
1102a93
[MIG] add support for H200 141GB
tariq1890 Jun 24, 2024
82e42ce
Bump mig-manager to v0.8.0-rc.1
cdesiniotis Jun 25, 2024
03e23e1
feat: vfio-manager graphics mode
jojimt Jun 14, 2024
6641f8e
chore: address review comment
jojimt Jun 25, 2024
cbdf4bf
add more checks
jojimt Jun 25, 2024
363cc23
address review comments
jojimt Jun 25, 2024
4d7859f
bump k8s.io and controller-runtime dependencies
tariq1890 Jun 25, 2024
84c5d81
update trunk image ref in OLM bundle
tariq1890 Jun 26, 2024
e8c4716
Bump k8s.io/apiextensions-apiserver
dependabot[bot] Jun 25, 2024
eb2c7da
bump helm client to v0.12.10
tariq1890 Jun 26, 2024
39a510e
Bump github.com/urfave/cli/v2 from 2.27.1 to 2.27.2
dependabot[bot] Jun 26, 2024
7c9592a
Create driver-ready file atomically
elezar Jun 27, 2024
a14dd5b
Bump github.com/NVIDIA/k8s-kata-manager
dependabot[bot] Jun 26, 2024
9bef2c4
Bump github.com/operator-framework/api from 0.23.0 to 0.26.0
dependabot[bot] Jun 27, 2024
d085520
bump the rest of the gpu-operator dependencies
tariq1890 Jun 27, 2024
fc94f73
Update kubevirt-gpu-device-plugin to v1.2.8
visheshtanksale Jun 27, 2024
045aad5
Bump dcgm to 3.3.6-1 and dcgm-exporter to 3.3.6-3.4.2
cdesiniotis Jun 27, 2024
c3e3ffc
standardise object hash generation across the repo
tariq1890 Jun 27, 2024
cf4e390
Exclude attestation manifests (sbom, provenance) during the build usi…
shivakunv Jun 20, 2024
ab9e976
Update vgpu device manager config for vGPU 17.2
cdesiniotis Jun 28, 2024
531e3f6
[gitlab-CI] bump docker buildx version to v0.15.1
tariq1890 Jul 5, 2024
bf52562
bump k8s-driver-manager to v0.6.9
tariq1890 Jul 6, 2024
dbbe603
Bump golang.org/x/mod from 0.18.0 to 0.19.0
dependabot[bot] Jul 7, 2024
c83a29b
upgrade golang version to v1.22.5
tariq1890 Jul 10, 2024
97c33fb
bump container toolkit RC version to v1.16.0-rc.2
tariq1890 Jul 10, 2024
f2d6af2
add dependabot config to bump toolchain deps
tariq1890 Jul 10, 2024
badbf01
[validator] address edge case where nvidia-smi on host is empty
cdesiniotis Jul 5, 2024
1717be8
Bump mig-manager to 0.8.0-rc.2
cdesiniotis Jul 11, 2024
c14bbfd
Bump sigs.k8s.io/kustomize/kustomize/v5 from 5.4.1 to 5.4.2 in /tools
dependabot[bot] Jul 11, 2024
af7f74b
Bump k8s.io/code-generator from 0.30.0 to 0.30.2 in /tools
dependabot[bot] Jul 11, 2024
8e2b714
add prometheus CR permissions back to gpu-operator clusterrole
tariq1890 Jul 11, 2024
f4b0c4f
Bump sigs.k8s.io/controller-tools from 0.14.0 to 0.15.0 in /tools
dependabot[bot] Jul 11, 2024
f7f0adc
regnerate manifests with new controller-gen version
tariq1890 Jul 11, 2024
ca23820
Bump NFD to v0.16.2
cdesiniotis May 29, 2024
5a6f24b
Configure NVIDIA_CDI_HOOK_PATH envvar in mig-manager
cdesiniotis Jul 3, 2024
e3e1364
Use NVIDIA_CDI_HOOK_PATH instead of NVIDIA_CTK_PATH envvar in the dev…
cdesiniotis Jul 3, 2024
2936ed5
set NFD priority class to system-node-critical
tariq1890 Jul 11, 2024
2fd64cc
Adding transformation for kata-manager daemonset for supporting CRI-O
visheshtanksale Jun 27, 2024
6b6b384
Specify open-ended OCP version range in OLM bundle
cdesiniotis Jul 11, 2024
ae4da3d
Bump github.com/regclient/regclient from 0.6.1 to 0.7.0
dependabot[bot] Jul 14, 2024
9e3ad0c
bump regclient in gitlab CI
tariq1890 Jul 15, 2024
8e52842
Bump nvidia/cuda from 12.5.0-base-ubi8 to 12.5.1-base-ubi8 in /docker
dependabot[bot] Jul 15, 2024
809dc3e
Bump nvidia/cuda from 12.5.0-base-ubi8 to 12.5.1-base-ubi8 in /validator
dependabot[bot] Jul 15, 2024
f9bd06d
bump container-toolkit to v1.16.0
tariq1890 Jul 15, 2024
be3245a
bump R535 TRD to version 535.183.06
tariq1890 Jul 15, 2024
8a03145
bump vfioManager and operator init containers to cuda 12.5.1
tariq1890 Jul 15, 2024
6e46b93
Bump gdrcopy driver image to v2.4.1-1
cdesiniotis Jul 15, 2024
2ad1c1e
bump k8s-device-plugin and gfd to v0.16.0
tariq1890 Jul 16, 2024
1ba48ff
bump mig-manager to v0.8.0
tariq1890 Jul 16, 2024
cecb12d
Bump vgpu-device-manager to v0.2.7
cdesiniotis Jul 16, 2024
ed77350
Bump k8s-kata-manager to v0.2.1
cdesiniotis Jul 16, 2024
f693fc2
bump k8s-driver-manager to v0.6.10
tariq1890 Jul 16, 2024
6f2dfeb
move scc privileges to clusterrole as it's cluster-scoped
tariq1890 Jul 17, 2024
9c88354
bump node-feature-discovery to version v0.16.3
tariq1890 Jul 17, 2024
cf17e4a
Allow passing custom mig-parted configmap data in helm chart
frittentheke Jul 2, 2024
f76f268
Update the value of migManager.config.name in values.yaml
cdesiniotis Jul 18, 2024
990405a
Update migManager.config.data to represent the entire configmap data …
cdesiniotis Jul 18, 2024
6621f9b
Update CONTRIBUTING.md following Github migration
cdesiniotis Jul 18, 2024
0443348
Remove stale PR template
cdesiniotis Jul 18, 2024
b20c384
bump kubevirt gpu device plugin to v1.2.9
tariq1890 Jul 18, 2024
cf08a9d
Fix helm templates for migManager config
cdesiniotis Jul 18, 2024
63234e4
Remove stale RELEASE.md file
cdesiniotis Jul 19, 2024
f24070e
Bump k8s-device-plugin to v0.16.1
cdesiniotis Jul 23, 2024
539587d
Bump toolkit to v1.16.1
cdesiniotis Jul 23, 2024
4f24c80
Bump github.com/NVIDIA/go-nvlib from 0.6.0 to 0.6.1
dependabot[bot] Jul 23, 2024
c9efbe7
bump dcgm and dcgm-export to 3.3.7 versions
tariq1890 Jul 24, 2024
ddc65f7
[vgpu-manager] mount necessary directories so that GSP firmware can b…
cdesiniotis Jul 24, 2024
4a71512
[vgpu-manager] align clusterrole with state-nvidia-driver
cdesiniotis Jul 24, 2024
31027ab
Bump project version to v24.6.0
cdesiniotis Jul 26, 2024
87c3aec
Use GitHub image as staging image
cdesiniotis Jul 29, 2024
83f3f26
Fix pull of github staging image for validator
cdesiniotis Jul 29, 2024
72a63cc
Update must-gather.sh URL in github issue template
cdesiniotis Jul 29, 2024
4de5dbe
add RHOCP certified v24.3.0 OLM bundle
tariq1890 Aug 1, 2024
4024b28
Add OLM bundle for 24.6.0
cdesiniotis Jul 31, 2024
a708a07
[ci] move IN_REGISTRY definition for validator to a template
cdesiniotis Aug 1, 2024
14135dc
Add unit tests for common daemonset transformations
cdesiniotis Jul 18, 2024
6ca4048
Add unit tests for transforms to driver-manager and validation init c…
cdesiniotis Jul 23, 2024
1a04162
Add unit test for TransformValidatorShared()
cdesiniotis Jul 26, 2024
948f325
Remove unused function parameter for TransformValidatorShared()
cdesiniotis Jul 26, 2024
d145169
Add unit tests for TransformValidatorComponent()
cdesiniotis Jul 26, 2024
563b1c9
Add unit tests for TransformValidator and TransformSandboxValidator
cdesiniotis Jul 28, 2024
9bf0c88
controller-runtime cache should only list-watch resources in the oper…
tariq1890 Aug 1, 2024
89a8240
[node-status-exporter] fix bug in retrieving the nvidia-driver-daemonset
tariq1890 Aug 2, 2024
eee03d4
[H100 NVL]update all-balanced MIG config
tariq1890 Aug 5, 2024
e66842d
Bump device-plugin to v0.16.2
cdesiniotis Aug 8, 2024
356667a
Pin holodeck gh action to v0.2.1
cdesiniotis Aug 8, 2024
76488a2
Bump project version to v24.6.1
cdesiniotis Aug 8, 2024
e52d996
add the v24.6.1 OLM bundle
tariq1890 Aug 12, 2024
3bcf147
fix govet issues and pin golangci-lint version
tariq1890 Aug 14, 2024
1f20f52
remove R470 driver branch as it is EOL'ed
tariq1890 Aug 14, 2024
bc99038
Bump github.com/regclient/regclient from 0.7.0 to 0.7.1
dependabot[bot] Aug 5, 2024
56940e4
update regctl in gitlab CI yml
tariq1890 Aug 14, 2024
2122769
add H800 GPU to the MIG configmap
tariq1890 Aug 20, 2024
5bf8a63
Bump nvidia/cuda from 12.5.1-base-ubi8 to 12.6.0-base-ubi8 in /docker
dependabot[bot] Aug 13, 2024
3a2775c
Bump nvidia/cuda from 12.5.1-base-ubi8 to 12.6.0-base-ubi8 in /validator
dependabot[bot] Aug 13, 2024
845efe9
Bump NVIDIA/holodeck from 0.2.1 to 0.2.3
dependabot[bot] Aug 19, 2024
6ac54d6
Revert "Bump NVIDIA/holodeck from 0.2.1 to 0.2.3"
tariq1890 Aug 22, 2024
a06ad66
update operator initContainer and vfioManager images to 12.6.0
tariq1890 Aug 26, 2024
e010aff
update base image to the ubi9 variant
tariq1890 Aug 16, 2024
6fcc811
Bump sigs.k8s.io/controller-runtime from 0.18.4 to 0.19.0
dependabot[bot] Aug 18, 2024
1118d95
fix compilation errors after importing controller-runtime v0.19.0
tariq1890 Aug 26, 2024
77b8d90
bump controller-tools to v0.16.1 for compatibility with controller-ru…
tariq1890 Aug 26, 2024
83bb2b6
update prometheus-operator to version v0.75.2
ajayk Jul 30, 2024
74b7f9e
[CVE-2024-41110] bump go-helm-client to v0.12.13
tariq1890 Aug 26, 2024
9d367a8
update dependabot gomod update frequency to daily
tariq1890 Aug 27, 2024
2fa85cf
update go and golangci-lint versions
tariq1890 Aug 27, 2024
a769646
update gitlab CI and vfio-manager to use ubi9 DIST
tariq1890 Aug 27, 2024
2e4cd94
Bump github.com/urfave/cli/v2 from 2.27.2 to 2.27.4
dependabot[bot] Aug 26, 2024
edc7d20
Bump github.com/onsi/gomega from 1.33.1 to 1.34.1
dependabot[bot] Aug 27, 2024
03669d1
Bump github.com/onsi/ginkgo/v2 from 2.19.0 to 2.20.1
dependabot[bot] Aug 27, 2024
81de5e2
Bump sigs.k8s.io/kustomize/kustomize/v5 from 5.4.2 to 5.4.3 in /tools
dependabot[bot] Aug 27, 2024
adbdf50
Bump github.com/prometheus/client_golang from 1.19.1 to 1.20.2
dependabot[bot] Aug 27, 2024
c73e44a
Bump github.com/prometheus-operator/prometheus-operator/pkg/apis/moni…
dependabot[bot] Aug 27, 2024
9c6104e
Bump github.com/operator-framework/api from 0.26.0 to 0.27.0
dependabot[bot] Aug 28, 2024
b1a7345
add support for configuring tolerations in upgrade-crd hook
tariq1890 Aug 28, 2024
9178a3a
config custom metrics
chipzoller Aug 22, 2024
c4ca83b
bump k8s-operator-libs dep to the latest version
tariq1890 Aug 27, 2024
5410a2a
add support configuring tolerations in cleanupCRD JOB
tariq1890 Aug 30, 2024
6ffe2e4
Bump sigs.k8s.io/controller-tools from 0.16.1 to 0.16.2 in /tools
dependabot[bot] Aug 30, 2024
17930d2
sync generated assets with controller-tools v0.16.2
tariq1890 Aug 30, 2024
8ef0ee5
add make target to automate sync'ing of generated crds into helm and …
tariq1890 Aug 30, 2024
1fb5c57
add gpu driver 560.35.03
tariq1890 Sep 5, 2024
307b8a3
Bump golang.org/x/mod from 0.20.0 to 0.21.0
dependabot[bot] Sep 5, 2024
430db3d
Bump github.com/Masterminds/sprig/v3 from 3.2.3 to 3.3.0
dependabot[bot] Sep 5, 2024
c5ef6d1
Bump github.com/onsi/ginkgo/v2 from 2.20.1 to 2.20.2
dependabot[bot] Sep 5, 2024
c083842
Bump github.com/onsi/gomega from 1.34.1 to 1.34.2
dependabot[bot] Sep 6, 2024
dd61e24
Bump github.com/prometheus/client_golang from 1.20.2 to 1.20.3
dependabot[bot] Sep 6, 2024
d58d9e9
Bump github.com/prometheus-operator/prometheus-operator/pkg/apis/moni…
dependabot[bot] Sep 6, 2024
8179910
disable privileged mode for toolkit-validation init containers
tariq1890 Sep 6, 2024
646d821
update K8s version used by holodeck to v1.31
tariq1890 Sep 6, 2024
9240187
Bump github.com/prometheus-operator/prometheus-operator/pkg/apis/moni…
dependabot[bot] Sep 11, 2024
d0ddc1f
drop the DIST (ubi9) suffix in image tags
tariq1890 Aug 29, 2024
bcc47fb
[RBAC cleanup] move namespaced resources to Role from ClusterRole
tariq1890 Sep 11, 2024
75a441f
Bump nvidia/cuda from 12.6.0-base-ubi9 to 12.6.1-base-ubi9 in /validator
dependabot[bot] Sep 17, 2024
3418ae9
Bump nvidia/cuda from 12.6.0-base-ubi9 to 12.6.1-base-ubi9 in /docker
dependabot[bot] Sep 17, 2024
eaee532
downgrade go from 1.23.0 to 1.22.7
tariq1890 Sep 17, 2024
9db8f34
[nvidia-ci] drop dist tag suffix when cloning ghcr.io images
tariq1890 Sep 18, 2024
eb144c8
add gpu driver container 550.90.12
tariq1890 Sep 18, 2024
792a59e
drop dist tag suffix when referencing images in scan and sign jobs
tariq1890 Sep 18, 2024
35a359a
bump dcgm and dcg-exporter to versions 3.3.8 and 3.3.8-3.6.0
tariq1890 Sep 20, 2024
33c0bbb
Bump project version to 24.6.2
cdesiniotis Sep 23, 2024
f4b89ac
Bump toolkit to 1.16.2
cdesiniotis Sep 24, 2024
439f4d2
Bump github.com/mittwald/go-helm-client from 0.12.13 to 0.12.14
dependabot[bot] Sep 24, 2024
a397398
Bump github.com/NVIDIA/nvidia-container-toolkit from 1.16.1 to 1.16.2
dependabot[bot] Sep 26, 2024
8dffbd8
Bump k8s.io/code-generator from 0.31.0 to 0.31.1 in /tools
dependabot[bot] Sep 12, 2024
8b87a93
set RUNTIME_CONFIG and RUNTIME_SOCKET envars to support new toolkit v…
tariq1890 Sep 28, 2024
934e08f
bump cuda base img to 12.6.1 and go to 1.22.8
tariq1890 Oct 2, 2024
985add4
Bump github.com/prometheus/client_golang from 1.20.3 to 1.20.4
dependabot[bot] Sep 18, 2024
096c680
[v24.6.2] add RHOCP certified OLM bundle
tariq1890 Oct 2, 2024
3111a0b
bump NFD to v0.16.4
tariq1890 Oct 4, 2024
920bb9d
Bump sigs.k8s.io/controller-tools from 0.16.2 to 0.16.3 in /tools
dependabot[bot] Sep 26, 2024
0ba057b
update CRDs
tariq1890 Oct 4, 2024
17e1b64
[state-driver] add downward API envars to fetch node name and IP
tariq1890 Oct 7, 2024
90780d4
enable gpu-operator and NFD CRD updates by default
tariq1890 Oct 4, 2024
54f2613
Bump github.com/NVIDIA/go-nvlib from 0.6.1 to 0.7.0
dependabot[bot] Oct 10, 2024
39cda1e
Bump sigs.k8s.io/kustomize/kustomize/v5 from 5.4.3 to 5.5.0 in /tools
dependabot[bot] Oct 10, 2024
8a52409
sync go.mod with latest kustomize dependency
tariq1890 Oct 10, 2024
a95856a
Revert "disable privileged mode for toolkit-validation init containers"
tariq1890 Oct 11, 2024
c729500
[gpu-operator-validator] minor code cleanup
tariq1890 Oct 10, 2024
3f20df3
Bump sigs.k8s.io/controller-tools from 0.16.3 to 0.16.4 in /tools
dependabot[bot] Oct 11, 2024
9891713
update CRDs
tariq1890 Oct 11, 2024
c99e9f1
Bump github.com/urfave/cli/v2 from 2.27.4 to 2.27.5
dependabot[bot] Oct 14, 2024
ee86289
Bump nvidia/cuda from 12.6.1-base-ubi9 to 12.6.2-base-ubi9 in /docker
dependabot[bot] Oct 14, 2024
9a778f0
Bump nvidia/cuda from 12.6.1-base-ubi9 to 12.6.2-base-ubi9 in /validator
dependabot[bot] Oct 14, 2024
b9502a2
bump cuda base image in helm chart and OLM bundle
tariq1890 Oct 14, 2024
0ab0fa5
bump gdrcopy, vgpu-device-mgr and k8s-driver-mgr versions
tariq1890 Oct 15, 2024
90640ea
bump node-feature-discovery to v0.16.5
tariq1890 Oct 15, 2024
f944ef7
bump kubevirt-gpu-device-plugin to v1.2.10
tariq1890 Oct 15, 2024
606a0cf
bump k8s-kata-manager to v0.2.2
tariq1890 Oct 15, 2024
759420a
bump openshift-cllient-go to the latest main
tariq1890 Oct 15, 2024
3542a47
add OCP 4.17 to the supported Openshift versions list
tariq1890 Oct 21, 2024
9a07f94
Make the IMEX nodes config file available to GFD
cdesiniotis Oct 22, 2024
434cedb
Add GH200 144G HBM3e (x234810DE) to default-mig-parted-config
cdesiniotis Oct 23, 2024
ed5a8f6
bump R550 and R535 drivers to 550.127.05 and 535.216.01
tariq1890 Oct 23, 2024
e13db2e
Add missing 1g.36gb config to default-mig-parted-config
cdesiniotis Oct 24, 2024
c85f478
Add copu-pr-bot
ArangoGutierrez Oct 24, 2024
7c7a3bb
Bump github.com/NVIDIA/k8s-kata-manager from 0.2.0 to 0.2.2
dependabot[bot] Oct 25, 2024
7ed95ad
Use NV-GHA runners
ArangoGutierrez Oct 24, 2024
3276ab8
Revert "Use NV-GHA runners"
tariq1890 Oct 25, 2024
fa4a666
Revert "Add copu-pr-bot"
tariq1890 Oct 25, 2024
082e84c
bump mig-manager to v0.10.0
tariq1890 Oct 25, 2024
0be7f37
Add init container to GFD for handling imex nodes config mount
cdesiniotis Oct 25, 2024
c34f398
Use HostToContainer mount propagation for validations directory in GFD
cdesiniotis Oct 25, 2024
65804f0
Always add 'config' emptyDir volume to GFD and device-plugin daemonsets
cdesiniotis Oct 25, 2024
aadeec2
Bump k8s-device-plugin to v0.17.0-rc.1
cdesiniotis Oct 28, 2024
59b0aef
Bump nvidia-container-toolkit to v0.17.0-rc.2
cdesiniotis Oct 28, 2024
cd739d6
bump GDS image to 2.20.5
tariq1890 Oct 23, 2024
c76ca6a
Always add 'config' emptyDir volume to MPS daemonset
cdesiniotis Oct 28, 2024
68be4f5
Add copu-pr-bot
ArangoGutierrez Oct 29, 2024
aa3974d
Bump github.com/onsi/gomega from 1.34.2 to 1.35.0
dependabot[bot] Oct 30, 2024
620d684
Bump github.com/onsi/ginkgo from 2.20.2 to 2.21.0
tariq1890 Oct 30, 2024
b2b990b
add support for driver 565.57.01
tariq1890 Oct 31, 2024
facd2d0
Bump github.com/NVIDIA/nvidia-container-toolkit
dependabot[bot] Oct 31, 2024
72c57e4
bump container-toolkit in helm values yaml and OLM bundle
tariq1890 Oct 31, 2024
3e92957
bump k8s-device-plugin and gpu-feature-discovery to v0.17.0
tariq1890 Oct 31, 2024
5b8de19
bump node-feature-discovery to v0.16.6
tariq1890 Oct 31, 2024
07403f0
Bump project version to 24.9.0
cdesiniotis Oct 29, 2024
bcc5d50
add 24.9.0 OLM bundle
tariq1890 Oct 31, 2024
470379b
cleanup redundant sign:ngc jobs and fix bug in release:ngc job
tariq1890 Oct 31, 2024
ff4959b
move permissions for events from Role to ClusterRole
tariq1890 Nov 4, 2024
b24835b
drop the distro-specific tag suffix from the device-plugin image
tariq1890 Nov 5, 2024
a4b605f
Revert "Revert "Use NV-GHA runners""
cdesiniotis Oct 31, 2024
45ba079
Update NV-GHA IP Ranges
ArangoGutierrez Nov 8, 2024
3c0adb5
bump golang to 1.23.3
tariq1890 Nov 12, 2024
7f35f63
Bump github.com/prometheus-operator/prometheus-operator/pkg/apis/moni…
dependabot[bot] Nov 18, 2024
df1fadb
bump min go module version 1.23
tariq1890 Nov 18, 2024
47199a8
enable hostPID in mps-control-daemon
tariq1890 Nov 9, 2024
e605e94
Bump github.com/NVIDIA/nvidia-container-toolkit from 1.17.0 to 1.17.2
dependabot[bot] Nov 19, 2024
8d71805
bump toolkit to v1.17.2 in helm chart and OLM bundle
tariq1890 Nov 19, 2024
4943091
Bump github.com/onsi/gomega from 1.35.0 to 1.35.1
dependabot[bot] Nov 5, 2024
8f6fce0
[mig-manager] add support for H200 NVL
tariq1890 Nov 15, 2024
72b5de2
Bump github.com/prometheus/client_golang from 1.20.4 to 1.20.5
dependabot[bot] Nov 18, 2024
ce30bad
bump dcgm to 3.3.9 and dcgm-exporter to 3.3.9-3.6.1 versions
tariq1890 Nov 19, 2024
2d38ca0
Bump golang.org/x/mod from 0.21.0 to 0.22.0
dependabot[bot] Nov 19, 2024
dc88417
Bump github.com/regclient/regclient from 0.7.1 to 0.7.2
dependabot[bot] Nov 20, 2024
6f9a659
bump regctl in gitlab CI
tariq1890 Nov 20, 2024
d794e25
update golangci-lint version to v1.62.0 and pass in version via GITHU…
tariq1890 Nov 20, 2024
2a97ed2
bump driver versions to 550.127.08 and 535.216.03
tariq1890 Nov 20, 2024
fc9c464
Bump sigs.k8s.io/controller-runtime from 0.19.0 to 0.19.1
dependabot[bot] Nov 20, 2024
80b62f3
Bump sigs.k8s.io/controller-tools from 0.16.4 to 0.16.5 in /tools
dependabot[bot] Nov 21, 2024
d1c619b
check in CRD changes from controller-tools update
tariq1890 Nov 21, 2024
5e3c24d
[GitHub Actions] inject golangci lint version at the right stage
tariq1890 Nov 21, 2024
42b659c
Bump github.com/onsi/ginkgo/v2 from 2.21.0 to 2.22.0
dependabot[bot] Nov 21, 2024
ea2bb2a
use apimachinery/pkg/util/json instead of sigs.k8s.io/json
tariq1890 Nov 22, 2024
b93f0b3
unified define k8s-driver-manager image info in values.yaml
lengrongfu Oct 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
21 changes: 11 additions & 10 deletions .common-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,13 @@ workflow:
- if: $CI_COMMIT_BRANCH
- if: $CI_COMMIT_TAG
- if: $CI_PIPELINE_SOURCE == "web"
- if: $CI_COMMIT_BRANCH == "master"
- if: $CI_COMMIT_BRANCH == "main"
- if: $CI_COMMIT_BRANCH =~ /^release-.*$/
- if: $CI_COMMIT_TAG && $CI_COMMIT_TAG != ""

.main-or-manual:
rules:
- if: $CI_COMMIT_BRANCH == "master"
- if: $CI_COMMIT_BRANCH == "main"
- if: $CI_COMMIT_BRANCH =~ /^release-.*$/
- if: $CI_COMMIT_TAG && $CI_COMMIT_TAG != ""
- if: $CI_PIPELINE_SOURCE == "schedule"
Expand All @@ -71,7 +71,7 @@ trigger-pipeline:

.buildx-setup:
before_script:
- export BUILDX_VERSION=v0.6.3
- export BUILDX_VERSION=v0.15.1
- apk add --no-cache curl
- mkdir -p ~/.docker/cli-plugins
- curl -sSLo ~/.docker/cli-plugins/docker-buildx "https://github.com/docker/buildx/releases/download/${BUILDX_VERSION}/buildx-${BUILDX_VERSION}.linux-amd64"
Expand All @@ -82,9 +82,9 @@ trigger-pipeline:
- '[[ -n "${SKIP_QEMU_SETUP}" ]] || docker run --rm --privileged multiarch/qemu-user-static --reset -p yes'

# Define targets for the gpu-operator and gpu-operator-validator images
.dist-ubi8:
.dist-ubi9:
variables:
DIST: ubi8
DIST: ubi9
CVE_UPDATES: "cyrus-sasl-lib"

.target-gpu-operator:
Expand All @@ -99,6 +99,7 @@ trigger-pipeline:
IMAGE_NAME: "${CI_REGISTRY_IMAGE}/gpu-operator-validator"
IN_IMAGE_NAME: "gpu-operator-validator"
IMAGE_ARCHIVE: "gpu-operator-validator.tar"
IN_REGISTRY: "${STAGING_REGISTRY}/gpu-operator"

# .release forms the base of the deployment jobs which push images to the CI registry.
# This is extended with the version to be deployed (e.g. the SHA or TAG) and the
Expand Down Expand Up @@ -149,7 +150,7 @@ trigger-pipeline:
# Download the regctl binary for use in the release steps
.regctl-setup:
before_script:
- export REGCTL_VERSION=v0.3.10
- export REGCTL_VERSION=v0.7.2
- apk add --no-cache curl
- mkdir -p bin
- curl -sSLo bin/regctl https://github.com/regclient/regclient/releases/download/${REGCTL_VERSION}/regctl-linux-amd64
Expand Down Expand Up @@ -181,23 +182,23 @@ trigger-pipeline:
release:staging-gpu-operator:
extends:
- .release:staging
- .dist-ubi8
- .dist-ubi9
- .target-gpu-operator
variables:
OUT_IMAGE_NAME: "${CI_REGISTRY_IMAGE}/staging/gpu-operator"

release:staging-gpu-operator-validator:
extends:
- .release:staging
- .dist-ubi8
- .dist-ubi9
- .target-gpu-operator-validator
variables:
OUT_IMAGE_NAME: "${CI_REGISTRY_IMAGE}/staging/gpu-operator-validator"

release:staging-latest-gpu-operator:
extends:
- .release:staging
- .dist-ubi8
- .dist-ubi9
- .target-gpu-operator
variables:
OUT_IMAGE_NAME: "${CI_REGISTRY_IMAGE}/staging/gpu-operator"
Expand All @@ -208,7 +209,7 @@ release:staging-latest-gpu-operator:
release:staging-latest-gpu-operator-validator:
extends:
- .release:staging
- .dist-ubi8
- .dist-ubi9
- .target-gpu-operator-validator
variables:
OUT_IMAGE_NAME: "${CI_REGISTRY_IMAGE}/staging/gpu-operator-validator"
Expand Down
4 changes: 2 additions & 2 deletions .github/ISSUE_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,10 +39,10 @@ _Detailed steps to reproduce the issue._
Collecting full debug bundle (optional):

```
curl -o must-gather.sh -L https://raw.githubusercontent.com/NVIDIA/gpu-operator/master/hack/must-gather.sh
curl -o must-gather.sh -L https://raw.githubusercontent.com/NVIDIA/gpu-operator/main/hack/must-gather.sh
chmod +x must-gather.sh
./must-gather.sh
```
**NOTE**: please refer to the [must-gather](https://raw.githubusercontent.com/NVIDIA/gpu-operator/master/hack/must-gather.sh) script for debug data collected.
**NOTE**: please refer to the [must-gather](https://raw.githubusercontent.com/NVIDIA/gpu-operator/main/hack/must-gather.sh) script for debug data collected.

This bundle can be submitted to us via email: **operator_feedback@nvidia.com**
3 changes: 3 additions & 0 deletions .github/copy-pr-bot.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#configuration

enabled: true
43 changes: 43 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Please see the documentation for all configuration options:
# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates

version: 2
updates:
- package-ecosystem: "gomod"
target-branch: main
directory: "/"
schedule:
interval: "daily"
labels:
- dependencies
groups:
k8sio:
patterns:
- k8s.io/*
exclude-patterns:
- k8s.io/klog/*

- package-ecosystem: "gomod"
target-branch: main
directory: "/tools"
schedule:
interval: "daily"
labels:
- dependencies

# Update GPU Operator base images.
- package-ecosystem: "docker"
directory: "/docker"
schedule:
interval: "daily"

# Update GPU Operator Validator base images.
- package-ecosystem: "docker"
directory: "/validator"
schedule:
interval: "daily"

- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "daily"
5 changes: 0 additions & 5 deletions .github/pull_request_template.md

This file was deleted.

113 changes: 0 additions & 113 deletions .github/workflows/blossom-ci.yml

This file was deleted.

Loading