-
Notifications
You must be signed in to change notification settings - Fork 2k
libcuda.so can not be found in /usr/local/cuda/lib64 when building mxnet in nvidia/cuda docker #37
Comments
I made a workaround by using the stub libcuda.so during build At runtime I copy the /usr/local/nvidia/lib64/ before calling mxnet from R. Did I do it in a correct way like this? Are there alternative ways to do this? |
Not sure why, but it looks like mxnet is using both the CUDA runtime API ( At runtime, you have two solution:
CMD ldconfig && <MXNET_COMMAND> |
So after further review, we are missing the CUDA driver stubs in our CUDA images. |
Thanks for the quick reply! In the 7.5 image I only could find the cuda driver stubs at: /usr/local/cuda-7.5/targets/x86_64-linux/lib/stubs/libcuda.so I suppose there should be symbolic links at /usr/local/cuda etc. I couldn't find any documentation on how to compile and then run code within the image, maybe an idea to put that somewhere in the README.md file? I will try out the CMD ldconfig && |
My bad my image was corrupted, we do include it. Compiling/running code is done through your Dockerfile (see documentation) FROM nvidia/cuda:cudnn
RUN git clone <MXNET_REPO>
RUN sed <MXNET_CONFIG>
# Something along these lines
# ADD_LDFLAGS = -L /usr/local/cuda/lib64/stubs
# USE_CUDA = 1
# USE_CUDNN = 1
RUN make
CMD <MXNET_COMMAND> |
Thanks for the helpful pointers! The nvidia-docker wrapper works pretty great! |
@3XX0 , I am having a problem that relates. I use
but there is no libcuda.so file to be found anywhere. I search:
but no luck. Any idea of what I am doing wrong? It seems like tensorflow used to import in 1.0.0 but just say it couldn't find it. Now in 1.2.0, it will not even import. |
@ljstrnadiii is it during a |
@flx42 ,
After exiting the gcp server and ssh back in I just ran the same container and suddenly nvidia-smi does not even work and libcuda.so.1 is no where to be found. I am pretty confused. I wish there was tighter integration between nvidia and tensorflow. I really just want to be able to build an image to run tf apps EDIT: I guess I should start by calling nvidia-docker... |
Yes, you need to use |
I am trying to make a Dockerfile that compiles mxnet using nvidia-docker, based on the nvidia/cuda image. mxnet uses variable
USE_CUDA_PATH
in its make script to set the location of the cuda driver. It seems to ignore LD_LIBRARY_PATH
Usually you would set this to be /usr/local/cuda/lib64/ the libcudnn.so library can indeed be found there, for example.
In the nvidia/cuda Docker image however there is no libcuda.so in /usr/local/cuda/lib64, instead it seems to be located in /usr/local/nvidia/lib64/
Funnily enough, when I ln -s this libcuda.so.1 to /usr/local/cuda/lib64 it does build from within nvidia-docker run nvidia/cuda, but gives me a -lcuda not found error when performing the same command in "nvidia-docker build ..."
Is there a way to get libcuda.so in the /usr/local/cuda/lib64 directory during the nvidia-docker build?
The text was updated successfully, but these errors were encountered: