Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_dev.sh: using --gpus instead of --runtime nvidia #101

Open
Interpause opened this issue Nov 6, 2023 · 3 comments
Open

run_dev.sh: using --gpus instead of --runtime nvidia #101

Interpause opened this issue Nov 6, 2023 · 3 comments
Assignees
Labels
coming soon! Expected in upcoming release needs info Needs more information

Comments

@Interpause
Copy link

docker run -it --rm \
--privileged \
--network host \
${DOCKER_ARGS[@]} \
-v $ISAAC_ROS_DEV_DIR:/workspaces/isaac_ros-dev \
-v /dev/*:/dev/* \
-v /etc/localtime:/etc/localtime:ro \
--name "$CONTAINER_NAME" \
--runtime nvidia \

I noticed run_dev.sh's Docker container works (torch.cuda.is_available() returns True) if I replace --runtime nvidia with --gpus all. I also noticed in the dev environment setup guide (https://nvidia-isaac-ros.github.io/getting_started/dev_env_setup.html) that nvidia-container-runtime is deprecated. Is using --gpus all more suitable on newer versions of Docker?

@hemalshahNV hemalshahNV self-assigned this Nov 6, 2023
@hemalshahNV
Copy link
Contributor

--gpus all should enable the same runtime behavior but need to confirm with the nvidia-container-runtime engineers. Thanks for the heads up.

@hemalshahNV hemalshahNV added the coming soon! Expected in upcoming release label Nov 6, 2023
@YuminosukeSato
Copy link

YuminosukeSato commented Nov 17, 2023

Has this been changed?
If I use --gpus instead of --runtime in run_dev.sh, you will get an error like this.

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.

@jaiveersinghNV jaiveersinghNV added the needs info Needs more information label Nov 20, 2023
@jaiveersinghNV
Copy link
Contributor

@Buddies-as-you-know , could you confirm what version of the CUDA Drivers you have installed? The missing libnvidia-ml.so.1 library should be included as part of a proper CUDA installation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
coming soon! Expected in upcoming release needs info Needs more information
Projects
None yet
Development

No branches or pull requests

4 participants