-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(remount): relocate libraries along with their symlinks #255
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this contribution!
I validated this fix on a Fedora 40 system with the NVidia runtime etc. installed (so no /usr/lib/x86_64-linux-gnu
), using both Docker (v27.0.2) and K3s (v1.29.6).
(Edit: for posterity, also verified working on an AL2 EKS cluster.)
In both cases, I was able to successfully build the image nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2
, run nvidia-smi
and /tmp/vectorAdd
.
My only things I would like to see changed are:
- More commenting for future readers/code-spelunkers
- Remove references to PPC arch; I don't know if we will ever support this.
There may also be a similar workaround needed for AMD/Vulkan cards, but this can be tested separately.
0997cbd
to
07d0f1c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 👍
(cherry picked from commit 46a78fb)
This PR adds:
After that the container should behave like a regular container created by the NVIDIA container runtime. Of course/unfortunately, the process of mounting/unmounting requires GPU containers to run with privileges:
https://www.man7.org/linux/man-pages/man2/mount.2.html
https://www.man7.org/linux/man-pages/man2/umount.2.html
The logic is not generalized to any symlinks or any directories, it only aims at providing compatibility with the NVIDIA container runtime for now.
More context can be found in this comment #143 (comment)
Tested with the following images:
docker.io/library/debian:bookworm
docker.io/library/fedora:40
nvcr.io/nvidia/pytorch:24.05-py3
Closes #143