Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker missing requirement to use GPUs #50

Closed
SuhasSrinivasan opened this issue Jun 1, 2022 · 15 comments
Closed

Docker missing requirement to use GPUs #50

SuhasSrinivasan opened this issue Jun 1, 2022 · 15 comments

Comments

@SuhasSrinivasan
Copy link
Collaborator

After executing the provided Docker command, the following error occurs.

$ docker run -it --rm --memory=100g --gpus device=0  kundajelab/chrombpnet:dev
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

Docker requires the NVIDIA Container Toolkit to make GPUs accessible.
Installation instructions are here:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

@panushri25
Copy link
Collaborator

I don't think this issue you posted is related to the chrombpnet docker. Here is an online article related to this - NVIDIA/nvidia-docker#1034 (comment).

@SuhasSrinivasan
Copy link
Collaborator Author

The chrombpnet container will not run and have the above error message about GPUs.
These requirements are not documented.

Agree, as per the comment in the GitHub link, NVIDIA Container Toolkit is required.
The instructions for installing this are provided in NVIDIA's documentation.
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

@panushri25
Copy link
Collaborator

docker run --gpus all nvidia/cuda:9.0-base nvidia-smi

docker run --gpus all -it --rm tensorflow/tensorflow:latest-gpu
python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

@panushri25
Copy link
Collaborator

What is the output for the above two commands?

@panushri25
Copy link
Collaborator

It is documented that you need to finish nvidia setup on the GPUs. I can maybe provide the link for the setup to hep the users. What more are you talking about?

@SuhasSrinivasan
Copy link
Collaborator Author

docker run --gpus all nvidia/cuda:9.0-base nvidia-smi

This command works only after installing NVIDIA Container Toolkit.

@SuhasSrinivasan
Copy link
Collaborator Author

SuhasSrinivasan commented Jun 2, 2022

It is documented that you need to finish nvidia setup on the GPUs. I can maybe provide the link for the setup to hep the users. What more are you talking about?

Indeed, the tutorial states:

Firstly, it is recommended that you use a GPU for model training and have the necessary NVIDIA drivers already installed

NVIDIA drivers and CUDA toolkit are setup. But that alone is not sufficient to make the GPUs accessible to Docker.
NVIDIA Container Toolkit is a separate software that works with Docker.

@panushri25
Copy link
Collaborator

So you have nvidia container toolkit on your system correct? And nvidia-smi works?

Does this work for you now ? docker run -it --rm --memory=100g --gpus device=0 kundajelab/chrombpnet:dev

@SuhasSrinivasan
Copy link
Collaborator Author

Yes, the container worked after the NVIDIA Container Toolkit was installed.

@panushri25
Copy link
Collaborator

okay good

@panushri25
Copy link
Collaborator

It is documented that you need to finish nvidia setup on the GPUs. I can maybe provide the link for the setup to hep the users. What more are you talking about?

Indeed, the tutorial states:

Firstly, it is recommended that you use a GPU for model training and have the necessary NVIDIA drivers already installed

NVIDIA drivers and CUDA toolkit are setup. But that alone is not sufficient to make the GPUs accessible to Docker. NVIDIA Container Toolkit is a separate software that works with Docker.

Yeah I thought users will be aware of these setup when they are using gpu and dockers. I guess not, we will document this. Thank you!

@panushri25
Copy link
Collaborator

Let me know if you run are able to train using this setup.

@panushri25
Copy link
Collaborator

If you want to I can jump on a call sometime and make sure you run through all the steps smoothly

@annashcherbina
Copy link
Contributor

Thanks, added the info about https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html as note in readme under installing with docker.

@SuhasSrinivasan
Copy link
Collaborator Author

Thank you! Wish there were Tags for filers. This is just a Documentation issue, under Requirements for the Docker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants