-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuda-dev #158
cuda-dev #158
Conversation
@tfoote I hope to see this merged or receive comments if you think its not ready :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks like a good direction.
For this to go in it definitely needs some unit tests. And I'd like to have a better story about how the different layers of base, runtime and development can be used together. Aka get this working with #126 The lack of tests and validation of cooperation is why I haven't pushed #126 further forward.
And also it should not collide/conflict with the generic --nvidia option.
|
||
# nvidia-container-runtime | ||
ENV NVIDIA_VISIBLE_DEVICES all | ||
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These options here need to mutate/extend the driver capabilitiy not just overwrite it. aka it could be used for graphics too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to
ENV NVIDIA_VISIBLE_DEVICES ${NVIDIA_VISIBLE_DEVICES:-all}
ENV NVIDIA_DRIVER_CAPABILITIES ${NVIDIA_DRIVER_CAPABILITIES:+$NVIDIA_DRIVER_CAPABILITIES,}compute,utility
|
||
ENV CUDA_VERSION 11.2.1 | ||
|
||
ENV NV_CUDA_LIB_VERSION "11.2.1-1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having all of these versions embedded seems very fragile. Is there a way to do this more generically? What's the upgrade path, how can we let people adjust the version looking forward to future releases and other platforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest version gets updated constantly. It's now at 11.6.0 (https://gitlab.com/nvidia/container-images/cuda/blob/master/dist/11.6.0/ubuntu2004/base/Dockerfile)
How could we exploit https://gitlab.com/nvidia/container-images/cuda/-/blob/master/manifests/cuda.yaml ? If we could load this during the build and have user to type wanted Cuda version (e.g. 11.6.0), it could be much more general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you mention the latest gets updated frequently. How can we use the latest version without hard coding all these many versions in our codebase? if they're changing the version constantly we have to check it works and update our codebase too? How do we get notices and know what to update when? Will people's usage break w/o updates? Will we break users if we update too much? How do our users know/declare what versions they get/want?
For tests, we need tests that provide coverage of the code paths. They need to run examples that exercise the code paths and exercise the cuda capabilities so that we know that it's still working when we deploy it. We have 84% code coverage on the non-nvidia stuff. If you run it locally with the nvidia tests it's higher. But you content has zero unit tests. For the story we need to be able to explain to the users. If you want this features do this. If you want that feature do that. If you want this and that t do they work together? In particular since this is taking the capabilities of #126 and going further, it should probably be done as a series of layers that will first enable the base, and then install the specific tools, so that you have core nvidia, then extend it for the runtime, and then extend that for dev tools. Instead of having each plugin potentially installing different separate versions of cuda tools. |
Hmm... Very much unfamiliar with the code coverage concept. Let me get back to you when I understand what it is about :D |
There's an overview of how to run the tests and you'll see reports here: https://github.com/osrf/rocker#testing |
rocker/src/rocker/templates/cuda_dev_snippet.Dockerfile.em Lines 10 to 12 in 0dea4de
@tfoote I've made a change to use the package manager. It now avoids specifying versions for dependent packages. I've tested in WSL Ubuntu 20.04. It should also work for the generic Ubuntu installation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've spent a few hours to converge #182 and this into the branch for #126
And following the installation instructions from https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_network as closely as possible.
There seems to be an issue with installing cuda and nvidia-cuda-dev simultaneously. But I've run out of time to iterate on that right now so I've pushed it into the draft.
@@ -0,0 +1,23 @@ | |||
RUN apt-get update && apt-get install -y --no-install-recommends \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also found that nvidia-cuda-dev is available all the way back to bionic https://packages.ubuntu.com/bionic/nvidia-cuda-dev so we shouldn't by default necessarily use the nvidia ones but leverage the system ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried with and without nvidia-cuda-dev installation block. Both worked for me. No need to include nvidia-cuda-dev for pre-caching purpose.
@woensug-choi I think that I've merged all of this into #126 could you verify that it meets your needs now? Note: I changed the argument to |
Great thanks for verifying. |
The rocker cuda PR does not include executables of the Cuda libraries. It's because PR only copied apt-get installation of the Nvidia cudagl's base dockerfile. For our purposes, all those executables are required to compile the .cu source code of the multibeam sonar plugin.
I believe that --cuda flag without Cuda executables could have usability that does not require compile of .cu source files. I've made a new --cuda-dev flag that includes all installations required.
I've tested the functionality with the Dave project's Multibeam Sonar Plugin.
https://github.com/Field-Robotics-Lab/DAVE