Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update libnvidia-container and nvidia-container-toolkit #88

Merged
merged 3 commits into from
Aug 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions packages/libnvidia-container/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@ path = "../packages.rs"
releases-url = "https://github.com/NVIDIA/libnvidia-container/releases"

[[package.metadata.build-package.external-files]]
url = "https://github.com/NVIDIA/libnvidia-container/archive/v1.13.5/libnvidia-container-1.13.5.tar.gz"
sha512 = "00de15c2a0168b0c131eae21e10d186053be7f78021fe28785130ea541f1a592f44042697f01b3bf20717d9a93a85b34c7b510028adcc265cc0ac6f97be2bf0e"
url = "https://github.com/NVIDIA/libnvidia-container/archive/v1.16.1/libnvidia-container-1.16.1.tar.gz"
sha512 = "b304c284c5ab0c3544362307dc16ffcca8d34497e4356a520dc6da81a86a62b2a262b528cba559bb0d7a3addf018c3b50b6cb78669c82c1b4acae159e5922548"

[[package.metadata.build-package.external-files]]
url = "https://github.com/NVIDIA/nvidia-modprobe/archive/495.44/nvidia-modprobe-495.44.tar.gz"
sha512 = "67486ed1b17c8962786e13880910bb2b1938206a0fd76b360ddef7faf80ee0c941a2e3fbc73fa92a92009e2c54130dce17a466c8079537a981a2fed09c07e4c9"
url = "https://github.com/NVIDIA/nvidia-modprobe/archive/550.54.14/nvidia-modprobe-550.54.14.tar.gz"
sha512 = "279228aa315ff5fd1a23df23527aff58b2319f11f9fc7d939fa285ea933b4cc6d223451e20ecf7f50baba9f6c9c100e57cb77675d0d17fa77f19d3fea2ccc193"

[build-dependencies]
glibc = { path = "../glibc" }
Expand Down
5 changes: 3 additions & 2 deletions packages/libnvidia-container/libnvidia-container.spec
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
%global nvidia_modprobe_version 495.44
%global nvidia_modprobe_version 550.54.14
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to align with the driver version? I see that we were on 495 which implies no, but the branches in the GitHub repo implies some level of alignment with driver branches. I can't find anything to indicate if there is versioning we need to be worried about here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a similar question because I noticed nvidia-modprobe 550.54.15 is not the latest 550 release (it's from February). But, this is the version defined for building libnvidia-container 1.16.1: https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/mk/nvidia-modprobe.mk#L9

and for the previous version 495.44. used, I see under libnvidia-container 1.13.3, this nvidia-modprobe version is defined:
https://github.com/NVIDIA/libnvidia-container/blob/v1.13.3/mk/nvidia-modprobe.mk#L9

I still am not sure about a definitive answer to "Is this supposed to align with the driver version?", but from what I can tell we're just finding this from where libnvidia-container builds nvidia-modprobe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this modprobe version doesn't have to align with any driver version. As you can see, we were in 495 and yet we shipped two different driver versions, way different than this version. They are not tightly coupled, this is only required to make libnvidia-container happy. Nonetheless, I'll test aws-ecs-1-nvidia which uses the older NVIDIA driver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed that an instance created with aws-ecs-1-nvidia, which uses the older kernel module, does join a cluster and a task is running:

bash-5.1# docker exec -it 824b77b3d1c8 nvidia-smi
Thu Aug 15 20:23:01 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.256.02   Driver Version: 470.256.02   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10G         Off  | 00000000:00:1E.0 Off |                    0 |
|  0%   32C    P0    56W / 300W |      0MiB / 22731MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+


Name: %{_cross_os}libnvidia-container
Version: 1.13.5
Version: 1.16.1
Release: 1%{?dist}
Summary: NVIDIA container runtime library
# The COPYING and COPYING.LESSER files in the sources don't apply to libnvidia-container
Expand Down Expand Up @@ -59,6 +59,7 @@ export WITH_TIRPC=yes \\\
export WITH_NVCGO=yes \\\
export prefix=%{_cross_prefix} \\\
export DESTDIR=%{buildroot} \\\
export LIB_VERSION=%{version} \\\
%{nil}

%build
Expand Down
4 changes: 2 additions & 2 deletions packages/nvidia-container-toolkit/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ path = "../packages.rs"
releases-url = "https://github.com/NVIDIA/nvidia-container-toolkit/releases"

[[package.metadata.build-package.external-files]]
url = "https://github.com/NVIDIA/nvidia-container-toolkit/archive/v1.13.5/nvidia-container-toolkit-1.13.5.tar.gz"
sha512 = "7266e779abf27f2bc1b7c801e5eb4720b82be22bed3ec90171e4f5499b2bc7376f1369e4931d4db55edc8f5fd5e44d5e817eb258ec39bf55f16424fe725188d6"
url = "https://github.com/NVIDIA/nvidia-container-toolkit/archive/v1.16.1/nvidia-container-toolkit-1.16.1.tar.gz"
sha512 = "691d4fc47ea60b730ec491b333aa8118bcfd62cdab20a42b84155c6a13484d920e758435b5029bbae4fbefce82352aa5764f1554992682f689c95615809fb83c"

[build-dependencies]
glibc = { path = "../glibc" }
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
%global gorepo nvidia-container-toolkit
%global goimport %{goproject}/%{gorepo}

%global gover 1.13.5
%global gover 1.16.1
%global rpmver %{gover}

Name: %{_cross_os}nvidia-container-toolkit
Expand Down Expand Up @@ -50,6 +50,12 @@ Conflicts: %{name}-ecs

%build
%cross_go_configure %{goimport}

# We don't set `-Wl,-z,now`, because the binary uses lazy loading
# to load the NVIDIA libraries in the host
export CGO_LDFLAGS="-Wl,-z,relro -Wl,--export-dynamic"
export GOLDFLAGS="-compressdwarf=false -linkmode=external -extldflags '${CGO_LDFLAGS}'"

go build -ldflags="${GOLDFLAGS}" -o nvidia-container-runtime-hook ./cmd/nvidia-container-runtime-hook
go build -ldflags="${GOLDFLAGS}" -o nvidia-ctk ./cmd/nvidia-ctk

Expand Down
6 changes: 3 additions & 3 deletions packages/nvidia-k8s-device-plugin/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ path = "../packages.rs"
releases-url = "https://github.com/NVIDIA/k8s-device-plugin/releases"

[[package.metadata.build-package.external-files]]
url = "https://github.com/NVIDIA/k8s-device-plugin/archive/v0.14.4/v0.14.4.tar.gz"
path = "k8s-device-plugin-0.14.4.tar.gz"
sha512 = "055439c2aac797b2d594846d9fb572f2f46ad5caeb9f44107a2fc05211904823c01a8fd8a2329c13a47ef440fd017086067f7ec55d482970cdbc1663b36d714c"
url = "https://github.com/NVIDIA/k8s-device-plugin/archive/v0.16.2/v0.16.2.tar.gz"
path = "k8s-device-plugin-0.16.2.tar.gz"
sha512 = "0be166ba3f2ae51882e62e71dc625f6e83c4c18321e9e6beb05b7f2f6b3628e5ca7f480576f422faba0e6ad232085dff200b474f2453aeef307f9a6a5d13e1b6"

[build-dependencies]
glibc = { path = "../glibc" }
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
%global gorepo k8s-device-plugin
%global goimport %{goproject}/%{gorepo}

%global gover 0.14.4
%global gover 0.16.2
%global rpmver %{gover}

Name: %{_cross_os}nvidia-k8s-device-plugin
Expand Down Expand Up @@ -46,6 +46,7 @@ Conflicts: (%{_cross_os}image-feature(no-fips) or %{name}-bin)
%cross_go_setup %{gorepo}-%{gover} %{goproject} %{goimport}

%build
export GO_MAJOR="1.22"
%cross_go_configure %{goimport}
# We don't set `-Wl,-z,now`, because the binary uses lazy loading
# to load the NVIDIA libraries in the host
Expand Down