libcuda.so can not be found in /usr/local/cuda/lib64 when building mxnet in nvidia/cuda docker #37

ps-account · 2016-01-21T12:11:59Z

I am trying to make a Dockerfile that compiles mxnet using nvidia-docker, based on the nvidia/cuda image. mxnet uses variable

USE_CUDA_PATH

in its make script to set the location of the cuda driver. It seems to ignore LD_LIBRARY_PATH
Usually you would set this to be /usr/local/cuda/lib64/ the libcudnn.so library can indeed be found there, for example.

In the nvidia/cuda Docker image however there is no libcuda.so in /usr/local/cuda/lib64, instead it seems to be located in /usr/local/nvidia/lib64/

Funnily enough, when I ln -s this libcuda.so.1 to /usr/local/cuda/lib64 it does build from within nvidia-docker run nvidia/cuda, but gives me a -lcuda not found error when performing the same command in "nvidia-docker build ..."

Is there a way to get libcuda.so in the /usr/local/cuda/lib64 directory during the nvidia-docker build?

ps-account · 2016-01-21T13:42:54Z

I made a workaround by using the stub libcuda.so during build

At runtime I copy the /usr/local/nvidia/lib64/ before calling mxnet from R.

Did I do it in a correct way like this? Are there alternative ways to do this?

3XX0 · 2016-01-21T19:43:28Z

Not sure why, but it looks like mxnet is using both the CUDA runtime API (libcudart.so) and the CUDA driver API (libcuda.so). libcudart.so is linked automatically by nvcc so you're fine with the CUDA runtime.
Regarding the CUDA driver though, it will only be present in the container at runtime (in /usr/local/nvidia/lib64) so as you figured it out, you will need to compile the code against the libcuda.so stub (/usr/local/cuda/lib64/stubs) when you build the container.

At runtime, you have two solution:

If nothing has overridden LD_LIBRARY_PATH, you have nothing to do because the nvidia/cuda image sets it properly.
If something tampered with LD_LIBRARY_PATH the easiest way is to execute ldconfig before your command:

CMD ldconfig && <MXNET_COMMAND>

3XX0 · 2016-01-21T20:04:49Z

So after further review, we are missing the CUDA driver stubs in our CUDA images.
Not sure why, that's something we need to fix.

ps-account · 2016-01-21T20:23:50Z

Thanks for the quick reply!

In the 7.5 image I only could find the cuda driver stubs at:

/usr/local/cuda-7.5/targets/x86_64-linux/lib/stubs/libcuda.so

I suppose there should be symbolic links at /usr/local/cuda etc.

I couldn't find any documentation on how to compile and then run code within the image, maybe an idea to put that somewhere in the README.md file?

I will try out the CMD ldconfig &&

3XX0 · 2016-01-21T21:03:32Z

My bad my image was corrupted, we do include it.

Compiling/running code is done through your Dockerfile (see documentation)
In your case, I'm guessing it would look like that:

FROM nvidia/cuda:cudnn

RUN git clone <MXNET_REPO>

RUN sed <MXNET_CONFIG>
# Something along these lines
# ADD_LDFLAGS = -L /usr/local/cuda/lib64/stubs
# USE_CUDA = 1
# USE_CUDNN = 1

RUN make

CMD <MXNET_COMMAND>

ps-account · 2016-01-21T21:13:10Z

Thanks for the helpful pointers!

The nvidia-docker wrapper works pretty great!

ljstrnadiii · 2017-06-20T17:26:02Z

@3XX0 , I am having a problem that relates. I use

FROM nvidia/cuda:8.0-cudnn5-devel-ubuntu16.04

but there is no libcuda.so file to be found anywhere. I search:

sudo find /usr/ -name 'libcuda.so.1'

but no luck. Any idea of what I am doing wrong? It seems like tensorflow used to import in 1.0.0 but just say it couldn't find it. Now in 1.2.0, it will not even import.

flx42 · 2017-06-20T17:58:27Z

@ljstrnadiii is it during a docker build or docker run?
During a docker build, you can't use GPUs (nvidia-docker does nothing). But you can compile code against libcuda.so by using the stubs from the CUDA toolkit in /usr/local/cuda/lib64/stubs/

ljstrnadiii · 2017-06-20T18:57:49Z

@flx42 ,
During a docker run. For now, I am working in the docker image until debug everything. When I removed a WORKDIR in the dockerfile and built again the file is suddenly found here:

/usr/local/nvidia/lib64/libcuda.so.1

After exiting the gcp server and ssh back in I just ran the same container and suddenly nvidia-smi does not even work and libcuda.so.1 is no where to be found.

I am pretty confused. I wish there was tighter integration between nvidia and tensorflow.

I really just want to be able to build an image to run tf apps

EDIT: I guess I should start by calling nvidia-docker...

flx42 · 2017-06-20T19:52:28Z

Yes, you need to use nvidia-docker run

3XX0 added the question label Jan 22, 2016

3XX0 closed this as completed Jan 25, 2016

gunan mentioned this issue Aug 15, 2016

Add cuda_configure repository rule to autodetect cuda. tensorflow/tensorflow#3269

Merged

Kaixhin mentioned this issue Sep 7, 2016

HAS_GPU should not be required for builds NervanaSystems/neon#278

Open

tomjaguarpaw mentioned this issue Jun 27, 2017

Do I have to build my images from the NVIDIA ones? #410

Closed

This was referenced Mar 12, 2020

docker: Error response from daemon #1217

Closed

Unable to create container #1218

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libcuda.so can not be found in /usr/local/cuda/lib64 when building mxnet in nvidia/cuda docker #37

libcuda.so can not be found in /usr/local/cuda/lib64 when building mxnet in nvidia/cuda docker #37

ps-account commented Jan 21, 2016

ps-account commented Jan 21, 2016

3XX0 commented Jan 21, 2016

3XX0 commented Jan 21, 2016

ps-account commented Jan 21, 2016

3XX0 commented Jan 21, 2016

ps-account commented Jan 21, 2016

ljstrnadiii commented Jun 20, 2017

flx42 commented Jun 20, 2017 •

edited

Loading

ljstrnadiii commented Jun 20, 2017 •

edited

Loading

flx42 commented Jun 20, 2017

libcuda.so can not be found in /usr/local/cuda/lib64 when building mxnet in nvidia/cuda docker #37

libcuda.so can not be found in /usr/local/cuda/lib64 when building mxnet in nvidia/cuda docker #37

Comments

ps-account commented Jan 21, 2016

ps-account commented Jan 21, 2016

3XX0 commented Jan 21, 2016

3XX0 commented Jan 21, 2016

ps-account commented Jan 21, 2016

3XX0 commented Jan 21, 2016

ps-account commented Jan 21, 2016

ljstrnadiii commented Jun 20, 2017

flx42 commented Jun 20, 2017 • edited Loading

ljstrnadiii commented Jun 20, 2017 • edited Loading

flx42 commented Jun 20, 2017

flx42 commented Jun 20, 2017 •

edited

Loading

ljstrnadiii commented Jun 20, 2017 •

edited

Loading