[SOLVED] How to run nvidia-docker with TensorFlow GPU docker #45

bcordo · 2016-02-04T18:43:08Z

Thanks for releasing the nvidia-docker repo, this is a really great idea and very useful!

What I've Done

I have setup an equivalent of a Nvidia DIGITS machine (running Ubuntu 14.04 server), and am attempting to run everything in docker containers.

I have docker installed, and have run nvidia-docker run nvidia/cuda nvidia-smi described here, and I see my 4 TitanX graphic cards.
I have also run the nvidia-docker-plugin described here as sudo -u nvidia-docker nvidia-docker-plugin -s /var/lib/nvidia-docker and I get the output:

nvidia-docker-plugin | 2016/02/04 12:54:02 Loading NVIDIA management library
nvidia-docker-plugin | 2016/02/04 12:54:04 Loading NVIDIA unified memory
nvidia-docker-plugin | 2016/02/04 12:54:04 Discovering GPU devices
nvidia-docker-plugin | 2016/02/04 12:54:05 Provisioning volumes at /var/lib/nvidia-docker/volumes
nvidia-docker-plugin | 2016/02/04 12:54:05 Serving plugin API at /var/lib/nvidia-docker
nvidia-docker-plugin | 2016/02/04 12:54:05 Serving remote API at localhost:3476

which signifies to me that it's working.

I ran the tests here and they all passed.

My Problem

When I try to run the TensorFlow GPU docker image using nvidia-docker

I first run sudo -u nvidia-docker nvidia-docker-plugin -s /var/lib/nvidia-docker in a tmux session.

Then I run nvidia-docker run -it -p 8888:8888 b.gcr.io/tensorflow/tensorflow-devel-gpu it downloads everything and runs the docker container. Next I run ipython and try to import tensorflow but I get the following errors:

In [1]: import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:92] LD_LIBRARY_PATH: /usr/local/cuda/lib64
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:121] hostname: 16b84b6e71f9
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:146] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:257] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  352.79  Wed Jan 13 16:17:53 PST 2016
GCC version:  gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:150] kernel reported version is: 352.79
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1054] LD_LIBRARY_PATH: /usr/local/cuda/lib64
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1055] failed to find libcuda.so on this system: Failed precondition: could not dlopen DSO: libcuda.so; dlerror: libcuda.so: cannot open shared object file: No such file or directory

**I think I just have a lack in understanding about how I should run the TensorFlow container, or maybe I have to build the container using nvidia-docker.

Any ideas about how to do this, or general advice about what I'm doing wrong would be amazing. **

Thanks so much.

Brad

The text was updated successfully, but these errors were encountered:

3XX0 · 2016-02-04T18:51:27Z

This looks somewhat related to #44
Unfortunately, as @flx42 mentioned, the Tensorflow image on the container registry is outdated.
You best bet is to rebuild the tensorflow image manually (i.e. don't use the one on b.gcr.io).

How you build it doesn't really matter, nvidia-docker or docker will do it.

flx42 · 2016-02-04T18:59:40Z

Yes, I highly recommend building the images manually. Especially since the Tensorflow code moves fast and the Docker images are now a bit old.

bcordo · 2016-02-04T22:25:08Z

@3XX0 and @flx42 thanks so much for the really quick reply.

I went ahead with your advice and tried to build the docker images in https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/docker. I ran the command:
docker build -t $USER/tensorflow-suffix -f Dockerfile.gpu .

To which I get the error

Step 7 : RUN pip --no-cache-dir install     https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-${TENSORFLOW_VERSION}-cp27-none-linux_x86_64.whl
 ---> Running in 91e99a6e00b5
Collecting tensorflow==0.6.0 from https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.6.0-cp27-none-linux_x86_64.whl
/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:315: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#snimissingwarning.
  SNIMissingWarning
/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:120: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
Exception:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/pip/basecommand.py", line 209, in main
    status = self.run(options, args)
  File "/usr/local/lib/python2.7/dist-packages/pip/commands/install.py", line 299, in run
    requirement_set.prepare_files(finder)
  File "/usr/local/lib/python2.7/dist-packages/pip/req/req_set.py", line 359, in prepare_files
    ignore_dependencies=self.ignore_dependencies))
  File "/usr/local/lib/python2.7/dist-packages/pip/req/req_set.py", line 576, in _prepare_file
    session=self.session, hashes=hashes)
  File "/usr/local/lib/python2.7/dist-packages/pip/download.py", line 809, in unpack_url
    hashes=hashes
  File "/usr/local/lib/python2.7/dist-packages/pip/download.py", line 648, in unpack_http_url
    hashes)
  File "/usr/local/lib/python2.7/dist-packages/pip/download.py", line 841, in _download_http_url
    stream=True,
  File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/sessions.py", line 480, in get
    return self.request('GET', url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pip/download.py", line 377, in request
    return super(PipSession, self).request(method, url, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/sessions.py", line 468, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/adapters.py", line 447, in send
    raise SSLError(e, request=request)
SSLError: [Errno 1] _ssl.c:510: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
The command '/bin/sh -c pip --no-cache-dir install     https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-${TENSORFLOW_VERSION}-cp27-none-linux_x86_64.whl' returned a non-zero code: 2

I got the same error running, build -t $USER/tensorflow-suffix -f Dockerfile.devel-gpu ., and build -t $USER/tensorflow-suffix -f Dockerfile ..

This may be a general docker error, but I've been searching and thinking about solutions and have found none.

Thanks for your time. Hopefully, this will be helpful for others as well.

3XX0 · 2016-02-04T23:05:29Z

Weird looks like SSL CA are outdated or something.
Can you try to adding update-ca-certificates at the beginning of the RUN command in the Dockerfile:

RUN update-ca-certificates && pip --no-cache-dir install ...

bcordo · 2016-02-04T23:18:51Z

Good idea.

Unfortunately, I still get the error:

Step 6 : ENV TENSORFLOW_VERSION 0.6.0
 ---> Using cache
 ---> 11eba5b56bca
Step 7 : RUN update-ca-certificates && pip --no-cache-dir install     https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-${TENSORFLOW_VERSION}-cp27-none-linux_x86_64.whl
 ---> Running in 2344eaf2e522
Updating certificates in /etc/ssl/certs... 0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d....done.
Collecting tensorflow==0.6.0 from https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.6.0-cp27-none-linux_x86_64.whl
  Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f02730f6b50>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /tensorflow/linux/gpu/tensorflow-0.6.0-cp27-none-linux_x86_64.whl
  Retrying (Retry(total=3, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f02730f6090>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /tensorflow/linux/gpu/tensorflow-0.6.0-cp27-none-linux_x86_64.whl
  Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f02730f6a10>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /tensorflow/linux/gpu/tensorflow-0.6.0-cp27-none-linux_x86_64.whl
  Retrying (Retry(total=1, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f02730f68d0>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /tensorflow/linux/gpu/tensorflow-0.6.0-cp27-none-linux_x86_64.whl
  Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f02730f6a90>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /tensorflow/linux/gpu/tensorflow-0.6.0-cp27-none-linux_x86_64.whl
/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:315: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#snimissingwarning.
  SNIMissingWarning
/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:120: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
Exception:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/pip/basecommand.py", line 209, in main
    status = self.run(options, args)
  File "/usr/local/lib/python2.7/dist-packages/pip/commands/install.py", line 299, in run
    requirement_set.prepare_files(finder)
  File "/usr/local/lib/python2.7/dist-packages/pip/req/req_set.py", line 359, in prepare_files
    ignore_dependencies=self.ignore_dependencies))
  File "/usr/local/lib/python2.7/dist-packages/pip/req/req_set.py", line 576, in _prepare_file
    session=self.session, hashes=hashes)
  File "/usr/local/lib/python2.7/dist-packages/pip/download.py", line 809, in unpack_url
    hashes=hashes
  File "/usr/local/lib/python2.7/dist-packages/pip/download.py", line 648, in unpack_http_url
    hashes)
  File "/usr/local/lib/python2.7/dist-packages/pip/download.py", line 841, in _download_http_url
    stream=True,
  File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/sessions.py", line 480, in get
    return self.request('GET', url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pip/download.py", line 377, in request
    return super(PipSession, self).request(method, url, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/sessions.py", line 468, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/adapters.py", line 447, in send
    raise SSLError(e, request=request)
SSLError: [Errno 1] _ssl.c:510: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
The command '/bin/sh -c update-ca-certificates && pip --no-cache-dir install     https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-${TENSORFLOW_VERSION}-cp27-none-linux_x86_64.whl' returned a non-zero code: 2

It looks like the certs didn't get updated: Updating certificates in /etc/ssl/certs... 0 added, 0 removed; done.

Thanks for the suggestions.

bcordo · 2016-02-04T23:20:51Z

~~I wonder if it could be something to do with https on the host server.~~ I booted up a droplet, tried again, and also got the same ssl error.

3XX0 · 2016-02-05T00:02:58Z

Not sure it looks like a DNS problem now: Name or service not known
Are you behind a proxy ?

bcordo · 2016-02-05T00:16:39Z

Interesting.

I ran env | grep -i proxy, cat /etc/environment with no outputs. So I don't think the server is behind a proxy. It's running on University wifi, so I can try running it over ethernet (will need to go across the street), and see if that helps.

bcordo · 2016-02-05T00:19:38Z

Also if I run pdate-ca-certificates && pip install https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.6.0-cp27-none-linux_x86_64.whl on the host machine, it downloads just fine.

3XX0 · 2016-02-05T01:01:56Z

I just tried, same issue here.
Looks like an issue with the Tensorflow image.

ruffsl · 2016-02-05T04:11:12Z

I just inserted a RUN curl -0 https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-${TENSORFLOW_VERSION}-cp27-none-linux_x86_64.whl to download it and pointed pip at the local file. I'm not sure whats up with the ssl, but they hint at it here in a readme.

bcordo · 2016-02-05T14:16:00Z

@ruffsl really great idea! That worked, I built the image using docker build -t $USER/tensorflow-gpu2 -f Dockerfile.gpu . I then ran a docker container using said image with ~/docker_test/docker$ nvidia-docker run -it -p 8888:8888 brad/tensorflow-gpu2 but it still for some reason doesn't find the file libcuda.so, but loading libcublas.so, libcudnn.so, libcufft.so, and libcurand.so works just fine.

brad@truegpu:~/docker_test/docker$ nvidia-docker run -it -p 8888:8888 brad/tensorflow-gpu2
root@d5b7b996d9c6:~# ipython
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
Type "copyright", "credits" or "license" for more information.

IPython 4.1.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcublas.so.7.0 locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcudnn.so.6.5 locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcufft.so.7.0 locally
I tensorflow/stream_executor/dso_loader.cc:93] Couldn't open CUDA library libcuda.so. LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:121] hostname: d5b7b996d9c6
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:146] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:257] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  352.79  Wed Jan 13 16:17:53 PST 2016
GCC version:  gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:150] kernel reported version is: 352.79
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1060] LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1061] failed to find libcuda.so on this system: Failed precondition: could not dlopen DSO: libcuda.so; dlerror: libcuda.so: cannot open shared object file: No such file or directory
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcurand.so.7.0 locally

The interesting things is that if I run

root@87d1c13a3db8:~# ls /usr/local/nvidia/lib64/ | grep libcuda
libcuda.so.1
libcuda.so.352.79

There exists these two libcuda packages, but they are not called "libcuda.so", maybe this is just a naming issue?

Hmmm...

bcordo · 2016-02-05T14:42:58Z

Just found your comment here: tensorflow/tensorflow#808 (comment)

Which solved it! Thanks so much for the help @3XX0, @flx42, and @ruffsl. Really appreciate it.

Brad

bcordo · 2016-02-05T14:51:48Z

For posterity here is the Dockerfile that eventually worked for me (based on the above feedback):

FROM nvidia/cuda:7.0-cudnn2-runtime

MAINTAINER Craig Citro <craigcitro@google.com>

# Pick up some TF dependencies
RUN apt-get update && apt-get install -y \
        curl \
        libfreetype6-dev \
        libpng12-dev \
        libzmq3-dev \
        pkg-config \
        python-numpy \
        python-pip \
        python-scipy \
        wget \
        && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

RUN curl -O https://bootstrap.pypa.io/get-pip.py && \
    python get-pip.py && \
    rm get-pip.py

RUN pip --no-cache-dir install \
        ipykernel \
        jupyter \
        matplotlib \
        && \
    python -m ipykernel.kernelspec

# Install TensorFlow GPU version.
ENV TENSORFLOW_VERSION 0.6.0
RUN curl -O https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-${TENSORFLOW_VERSION}-cp27-none-linux_x86_64.whl
RUN pip --no-cache-dir install \
    tensorflow-${TENSORFLOW_VERSION}-cp27-none-linux_x86_64.whl

# Set up our notebook config.
COPY jupyter_notebook_config.py /root/.jupyter/

# Jupyter has issues with being run directly:
#   https://github.com/ipython/ipython/issues/7062
# We just add a little wrapper script.
COPY run_jupyter.sh /

# Create correct path for libuda so Tensorflow can open it
RUN ln -s /usr/local/nvidia/lib64/libcuda.so.1 /usr/lib/x86_64-linux-gnu/libcuda.so

# TensorBoard
EXPOSE 6006
# IPython
EXPOSE 8888

WORKDIR "/root"

CMD ["/bin/bash"]

In particular the lines

# Create correct path for libuda so Tensorflow can open it
RUN ln -s /usr/local/nvidia/lib64/libcuda.so.1 /usr/lib/x86_64-linux-gnu/libcuda.so

and

RUN curl -O https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-${TENSORFLOW_VERSION}-cp27-none-linux_x86_64.whl
RUN pip --no-cache-dir install \
    tensorflow-${TENSORFLOW_VERSION}-cp27-none-linux_x86_64.whl

were modified.

ruffsl · 2016-02-05T22:53:56Z

No, thank you @bcordo , those are some small but working fixes. I just tested the Dockerfile below (to avoid another compile) with this example and its working fine.

FROM b.gcr.io/tensorflow/tensorflow-devel-gpu
RUN ln -s /usr/local/nvidia/lib64/libcuda.so.1 /usr/lib/x86_64-linux-gnu/libcuda.so
LABEL com.nvidia.volumes.needed="nvidia_driver"
LABEL com.nvidia.cuda.version="7.0"

3XX0 · 2016-02-05T23:12:25Z

Hopefully, Tensorflow will fix these issues.
In the meantime that's a convenient workaround.

flx42 · 2016-02-17T22:49:39Z

Since Tensorflow 0.7 was released yesterday, the new images have the proper volume tags. But the libcuda.so problem is still here apparently.
Unfortunately, they also missed our image refresh this morning after the security fix for glibc.

Their GitHub is referencing old images:
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/docker

Those are the correct images:
https://www.tensorflow.org/versions/r0.7/get_started/os_setup.html#docker-installation

cancan101 · 2016-02-17T23:07:31Z

Agree that issues still exists. I had ln -s /usr/local/nvidia/lib64/libcuda.so.1 /usr/lib/x86_64-linux-gnu/libcuda.so to get this to work.

jendap · 2016-02-18T09:49:21Z

@ruffsl How does the LABEL work? Why do you have above the
LABEL com.nvidia.volumes.needed="nvidia_driver"
LABEL com.nvidia.cuda.version="7.0"

Is it not coming from the parent image "FROM nvidia/cuda" already?

ruffsl · 2016-02-18T15:14:59Z

@jendap , The first tensorflow images up on b.gcr.io using the nvidia/cuda images where built before nvidia made the change to add the labels to use for the nvidia docker plugin. Tensorflow has since rebuilt these images, updating the parent image, inheriting the needed labels, thus what I wrote above should no longer be necessary. Hopefully the cuda linking issue in the tensorflow gpu images will also be resolved so we can go using stock builds from google.

jendap · 2016-02-18T17:34:08Z

Cool. thanks. Cuda linking? Do you mean the "ln -s ..."?

BTW: Are the labels supposed to tell me I have wrong cuda before it even starts the container?

ruffsl · 2016-02-18T18:41:34Z

Linking orln -s ..., yes.

I think the label is only used to make sure your host's driver is compatible with the version of cuda in the container: /tools/src/nvidia-docker/utils.go#L56
But one of the devs could correct me on that.

jendap · 2016-02-18T19:02:17Z

That would be great if it would complain about the host driver being too old! I have to try it.

3XX0 · 2016-02-18T19:11:28Z

Yes it helps us prevent running CUDA containers that are not supported by the driver.
It's particularly useful when deploying something remotely.

philipz · 2016-07-21T14:43:01Z

Copy the libcudnn.so.XXXX to /var/lib/nvidia-docker/volumes/nvidia_driver/3xx.xx, and sudo ln -s libcudnn.so.5.0.5 libcudnn.so. Then cudnn will work in Tensorflow container.

flx42 · 2016-07-22T01:04:36Z

@philipz that's not a good idea to clobber the volume directory, not all containers are based on cuDNN v5. TensorFlow for instance is using cuDNN v4.

By the way, the TensorFlow images now work just fine (partly because the gpu image now depends on devel instead of runtime):

$ nvidia-docker run --rm tensorflow/tensorflow:nightly-devel-gpu python -c 'import tensorflow as tf ; print tf.__version__'
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
0.9.0

$ nvidia-docker run --rm tensorflow/tensorflow:nightly-gpu python -c 'import tensorflow as tf ; print tf.__version__'
[same]

So, no symlinks needed I believe.

tobegit3hub · 2016-08-15T12:51:24Z

Hi @3XX0 and @flx42 . Is it possible to run this with docker instead of nvidia-docker? We are running GPU containers with Kubernetes and have the similar problem which has been discussed above. It works for me with nvidia-docker but not with docker.

flx42 · 2016-08-15T17:21:36Z

@tobegit3hub Sure there is a way, our nvidia-docker-plugin daemon (which works as a Docker volume plugin) has a REST API:

$ curl -s http://localhost:3476/docker/cli
--device=/dev/nvidiactl --device=/dev/nvidia-uvm --device=/dev/nvidia3 --device=/dev/nvidia2 --device=/dev/nvidia1 --device=/dev/nvidia0 --volume-driver=nvidia-docker --volume=nvidia_driver_361.48:/usr/local/nvidia:ro
$ docker run -ti --rm `curl -s http://localhost:3476/docker/cli` nvidia/cuda nvidia-smi

See our wiki

tobegit3hub · 2016-08-16T04:38:55Z

Thanks @flx42 . I found another way to do that by mounting the devices and cuda libraries. nvidia-docker is still the easiest way and thanks for all your contribution.

bcordo mentioned this issue Feb 5, 2016

How to run nvidia-docker with TensorFlow GPU docker tensorflow/tensorflow#993

Closed

bcordo changed the title ~~How to run nvidia-docker with TensorFlow GPU docker~~ [SOLVED] How to run nvidia-docker with TensorFlow GPU docker Feb 5, 2016

3XX0 added the upstream issue label Feb 5, 2016

3XX0 closed this as completed Feb 5, 2016

ruffsl mentioned this issue Feb 18, 2016

Work nvidia-docker rather than using custom docker start script tensorflow/tensorflow#970

Closed

yeison mentioned this issue May 27, 2016

the cudnn lib directory is not in the tensorflow gpu docker image's LD_LIBRARY_PATH tensorflow/tensorflow#2525

Closed

tobegit3hub mentioned this issue Aug 15, 2016

NVIDIA GPU discovery, could support more than 1 GPU on 1 Minion, more than 1 GPU on 1 Pod. kubernetes/kubernetes#28216

Closed

dshieble mentioned this issue Sep 22, 2016

Unable to load cuDNN DSO - Missing Symlink? #205

Closed

tkanng mentioned this issue Dec 15, 2018

nvidia-docker can't start container while binding host's /run/nvidia-persistenced/socket to container's /run/nvidia-persistenced/socket. #885

Closed

This was referenced Mar 12, 2020

docker: Error response from daemon #1217

Closed

Unable to create container #1218

Closed

JonShelley mentioned this issue Dec 20, 2021

Fails to return GPU info when I run nvidia-smi inside the container but it works as expected outside of the container #1582

Closed

9 tasks

tkhan3 mentioned this issue May 8, 2022

docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]. #1637

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SOLVED] How to run nvidia-docker with TensorFlow GPU docker #45

[SOLVED] How to run nvidia-docker with TensorFlow GPU docker #45

bcordo commented Feb 4, 2016

3XX0 commented Feb 4, 2016

flx42 commented Feb 4, 2016

bcordo commented Feb 4, 2016

3XX0 commented Feb 4, 2016

bcordo commented Feb 4, 2016

bcordo commented Feb 4, 2016

3XX0 commented Feb 5, 2016

bcordo commented Feb 5, 2016

bcordo commented Feb 5, 2016

3XX0 commented Feb 5, 2016

ruffsl commented Feb 5, 2016

bcordo commented Feb 5, 2016

bcordo commented Feb 5, 2016

bcordo commented Feb 5, 2016

ruffsl commented Feb 5, 2016

3XX0 commented Feb 5, 2016

flx42 commented Feb 17, 2016

cancan101 commented Feb 17, 2016

jendap commented Feb 18, 2016

ruffsl commented Feb 18, 2016

jendap commented Feb 18, 2016

ruffsl commented Feb 18, 2016

jendap commented Feb 18, 2016

3XX0 commented Feb 18, 2016

philipz commented Jul 21, 2016

flx42 commented Jul 22, 2016 •

edited

Loading

tobegit3hub commented Aug 15, 2016

flx42 commented Aug 15, 2016

tobegit3hub commented Aug 16, 2016

[SOLVED] How to run nvidia-docker with TensorFlow GPU docker #45

[SOLVED] How to run nvidia-docker with TensorFlow GPU docker #45

Comments

bcordo commented Feb 4, 2016

What I've Done

My Problem

3XX0 commented Feb 4, 2016

flx42 commented Feb 4, 2016

bcordo commented Feb 4, 2016

3XX0 commented Feb 4, 2016

bcordo commented Feb 4, 2016

bcordo commented Feb 4, 2016

3XX0 commented Feb 5, 2016

bcordo commented Feb 5, 2016

bcordo commented Feb 5, 2016

3XX0 commented Feb 5, 2016

ruffsl commented Feb 5, 2016

bcordo commented Feb 5, 2016

bcordo commented Feb 5, 2016

bcordo commented Feb 5, 2016

ruffsl commented Feb 5, 2016

3XX0 commented Feb 5, 2016

flx42 commented Feb 17, 2016

cancan101 commented Feb 17, 2016

jendap commented Feb 18, 2016

ruffsl commented Feb 18, 2016

jendap commented Feb 18, 2016

ruffsl commented Feb 18, 2016

jendap commented Feb 18, 2016

3XX0 commented Feb 18, 2016

philipz commented Jul 21, 2016

flx42 commented Jul 22, 2016 • edited Loading

tobegit3hub commented Aug 15, 2016

flx42 commented Aug 15, 2016

tobegit3hub commented Aug 16, 2016

flx42 commented Jul 22, 2016 •

edited

Loading