Skip to content

lib64/libm.so GLIBC issue with ONNX GPU backend on Linux #826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Spartee opened this issue Jul 27, 2021 · 0 comments
Open

lib64/libm.so GLIBC issue with ONNX GPU backend on Linux #826

Spartee opened this issue Jul 27, 2021 · 0 comments

Comments

@Spartee
Copy link
Contributor

Spartee commented Jul 27, 2021

Describe the bug
The pre-built onnx backend provided by RedisAI expects that GLIBC_2.27 is available on the system. Many systems, especially in High Performance Computing (HPC), do not have this.

To Reproduce
Steps to reproduce the behavior:

  1. GIT_LFS_SKIP_SMUDGE=1 git clone --recursive https://github.com/RedisAI/RedisAI.git --branch v1.2.3 --depth=1
  2. CC=gcc CXX=g++ WITH_PT=0 WITH_TF=0 WITH_TFLITE=0 WITH_ORT=1 bash get_deps.sh gpu
  3. CC=gcc CXX=g++ GPU=1 WITH_PT=0 WITH_TF=0 WITH_TFLITE=0 WITH_ORT=1 WITH_UNIT_TESTS=0 make -C opt clean build
  4. start redisAI and set/run any onnx model.

or just ldd the redisai_onnxruntime.so

and you get:

tf-test) [spartee@horizon 17:44:07 redisai_onnxruntime]$ ldd redisai_onnxruntime.so 
./redisai_onnxruntime.so: /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /lus/cls01029/spartee/poseidon/backend-test/smartsim/lib/backends/redisai_onnxruntime/./lib/libonnxruntime.so.1.7.1)w

looking at libm on our systems it seems like we are laughably close (1 minor version away)

(tf-test) [spartee@horizon 19:49:38 on_wlm]$ strings /lib64/libm.so.6 | grep GLIBC
GLIBC_2.2.5
GLIBC_2.4
GLIBC_2.15
GLIBC_2.18
GLIBC_2.23
GLIBC_2.24
GLIBC_2.25
GLIBC_2.26
GLIBC_PRIVATE
GLIBC_2.15

But the odd thing is... the tensorflow shared library, when compiled for GPU, does not have the same problem...

# ldd on tensorflow
libm.so.6 => /lib64/libm.so.6 (0x00007fccb629a000)

I'm guessing this is because tensorflow is the one y'all are directly downloading from vendor? (i.e. Google)

Expected (wanted?) behavior
Ideally RedisAI could build an audit shared libraries the backends depend on to ensure that they will work on systems without such requirements. My guess is that the GPU builds for the backends are using some specific docker container that has extra goodies for the sake of ease of use, but not actually needed. @chayim is this the case?

I realize that #785 is currently being worked on, but this particular problem is a big issue for us, and we have also seen a similar problem with PyTorch which is why we switch to compiling in our own PyTorch (see #822)

Environment (please complete the following information):

  • OS: Suse Linux
  • Version [e.g. 1.2.2]: 15.2
  • Platfrom [e.g. x86, Jetson, ARM]: Intel x86
  • Runtime [e.g. CPU, CUDA]: CUDA 11.2 (tested 11.3 as well)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant