Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update run.bash to use --gpus all when possible #18

Merged
merged 2 commits into from
May 25, 2023
Merged

Update run.bash to use --gpus all when possible #18

merged 2 commits into from
May 25, 2023

Conversation

AndrejOrsula
Copy link
Contributor

This PR changes docker/run.bash to use --gpus=all instead of --runtime=nvidia for Docker version >19.3 (to enable NVIDIA GPUs). PR is based on one of the discussions in #14, but it is not confirmed if this is a better option. What do you think about this change?

For more context, the following comment describes the difference between nvidia-docker2 and nvidia-container-toolkit in depth: NVIDIA/nvidia-docker#1268 (comment)

Regarding this PR, the condition utilizes dpkg for version comparison. If dpkg is not detected on the system, then --runtime=nvidia is used as a fallback option. Similarly, --runtime=nvidia is used if Docker with version <=19.3.0 is installed.


On a side note, I often need to export these environment variables inside Docker containers to use NVIDIA GPU for certain applications. Do you think it is worth adding them to be on the safe side? (I don't know if setting these variables is too permissive or if it brings some disadvantages)

NVIDIA_VISIBLE_DEVICES="all"
NVIDIA_DRIVER_CAPABILITIES="all"

Copy link
Collaborator

@mabelzhang mabelzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That NVIDIA/nvidia-docker ticket you linked to was useful. Thanks! The summary of what I learned there is that the --gpus flag is for working with nvidia-container-toolkit (presumably for generic containers), which is recommended for Docker 19.03+, and it doesn't come with nvidia-container-runtime so doesn't work with the --runtime=nvidia flag. Whereas nvidia-docker2 is specifically for Docker and has the nvidia runtime (and it is actually required even for Docker 19.03 if you want to use Kubernetes, which we don't, so that point doesn't matter).

I tested this on Ubuntu 22.04 with an NVIDIA card. It didn't break anything for me, in that my .gz/rendering/ogre2.log says the same GPU Vendor as the previous flag.

@clalancette could you try this out to make sure it doesn't break your setup? I've approved this. If it works for you too, we can merge.

AndrejOrsula and others added 2 commits May 25, 2023 14:09
- Substitutes --runtime nvidia if Docker version >19.3 is detected

Signed-off-by: Andrej Orsula <orsula.andrej@gmail.com>
Signed-off-by: Chris Lalancette <clalancette@gmail.com>
Copy link
Collaborator

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, and should improve the situation for users. I added a comment just to explain a bit more. I'm going to go ahead and merge this, thanks!

@clalancette clalancette merged commit 9c21c20 into osrf:main May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants