Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImportError: .../jaxlib/xla_extension.so: symbol cudnnSetCTCLossDescriptorEx version libcudnn.so.7 not defined in file libcudnn.so.7 with link time reference #2494

Closed
jacobjinkelly opened this issue Mar 24, 2020 · 7 comments
Labels

Comments

@jacobjinkelly
Copy link
Contributor

I followed instructions on the README for installing with the following versions

PYTHON_VERSION=cp36  # alternatives: cp36, cp37, cp38
CUDA_VERSION=cuda101  # alternatives: cuda92, cuda100, cuda101, cuda102
PLATFORM=linux_x86_64  # alternatives: linux_x86_64
BASE_URL='https://storage.googleapis.com/jax-releases'
pip install --upgrade $BASE_URL/$CUDA_VERSION/jaxlib-0.1.42-$PYTHON_VERSION-none-$PLATFORM.whl

pip install --upgrade jax

I set the following environment variables

export LD_LIBRARY_PATH=/pkgs/cuda-10.1/lib64:/pkgs/cudnn-10.0-v7.4.2/lib64:$LD_LIBRARY_PATH
export XLA_FLAGS=--xla_gpu_cuda_data_dir=/pkgs/cuda-10.1/                                                              

I get the error message listed in the title as soon as I import jax.

Related issues include #989

@mattjj mattjj added bug Something isn't working build and removed bug Something isn't working labels Mar 24, 2020
@mattjj
Copy link
Collaborator

mattjj commented Mar 24, 2020

Is this a fresh install of CUDA / cuDNN?

@jacobjinkelly
Copy link
Contributor Author

Ah yes that appears to have been the issue. I changed to cudnn-10.2-v7.6.5 and this solved it.

@mattjj
Copy link
Collaborator

mattjj commented Mar 24, 2020

Woo! We did it without having to ask Peter for help!

@py4
Copy link

py4 commented Mar 27, 2020

I have cuda 10 and cudnn 7.4.1 and have the same issue (not a fresh cuda/cudnn installation). Should i necessarily install another cuda/cudann version? @mattjj

@jacobjinkelly
Copy link
Contributor Author

@py4 So the original reason for the error was actually me not understanding compatibility between versions of CUDA and cuDNN correctly (as described in this table. I got it to work with CUDA 10.1.243 with driver 430.50 and cudnn-10.1-v7.6.3.30 (I think the 10.1 means it's installed to work with CUDA 10.1, and it's version 7.6.30). To my understanding, according to this table, you'd need a driver version of at least 410.48 for CUDA 10.0. Note that you may still get some warning messages. Even after getting it to work, I still got the following warnings:

2020-03-26 15:12:11.901833: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pkgs/cuda-10.1/lib64:/pkgs/cudnn-10.1-v7.6.3.30/lib64: 2020-03-26 15:12:11.902503: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /pkgs/cuda-10.1/lib64:/pkgs/cudnn-10.1-v7.6.3.30/lib64: 2020-03-26 15:12:11.902514: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 2020-03-26 15:12:31.352429: W external/org_tensorflow/tensorflow/compiler/xla/service/hlo_pass_fix.h:49] Unexpectedly high number of iterations in HLO passes, exiting fixed point loop. 2020-03-26 15:12:45.680246: W external/org_tensorflow/tensorflow/compiler/xla/service/hlo_pass_fix.h:49] Unexpectedly high number of iterations in HLO passes, exiting fixed point loop.
which seem to be about cuDNN not having additional plugins to use TensorRT.

P.S.

You can figure out the driver version via nvidia-smi and you can find the specific version of CUDA (i.e. 10.1.243 in particular instead of just knowing 10.1) by checking /usr/local/cuda/version.txt (or wherever CUDA is installed on your machine)

@hawkinsp
Copy link
Collaborator

@py4 yes, that means you need to install a newer CuDNN. Can you give that a go? Hope that helps!

@refraction-ray
Copy link

Just for reference, I have the same issue and the reason is also version unmatch or out-of-date amongst drivers, cuda, cudnn and jaxlib. It hard to determine which combination will fail since this list doesn't cover the whole story.
Anyway, cases work for me:
GPU driver 418/430 + jaxlib 0.1.47 + cuda 10.1.243 + cudnn 7.6.5.32
cases fail for me:
GPU driver 418/430 + jaxlib 0.1.47 + cuda 10.0 + cudnn 7.5.1, though this combination works for gpu tensorflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants