Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undefined symbol xla::HloComputation::CollectUnreachableRoots() const with tf-nightly pip package #58281

Closed
maxhgerlach opened this issue Oct 24, 2022 · 4 comments
Assignees
Labels

Comments

@maxhgerlach
Copy link
Contributor

maxhgerlach commented Oct 24, 2022

Click to expand!

Issue Type

Bug

Source

binary

Tensorflow Version

tf-nightly-cpu==2.12.0.dev20221019 and newer

Custom Code

Yes

OS Platform and Distribution

Ubuntu 20.04

Mobile device

No response

Python version

3.9

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current Behaviour?

Starting from version 2.12.0.dev20221019 Horovod cannot be built correctly with tf-nightly. tf-nightly-cpu==2.12.0.dev20221018 and earlier are fine.

Upon import of horovod.tensorflow an undefined symbol error is encountered:

tensorflow.python.framework.errors_impl.NotFoundError: /usr/local/lib/python3.8/dist-packages/horovod/tensorflow/mpi_lib.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK3xla14HloComputation23CollectUnreachableRootsEv

In demangled form that is xla::HloComputation::CollectUnreachableRoots() const.

Has the implementation been moved to a different dynamic library recently? Horovod currently links to libtensorflow_framework.so.2 and to _pywrap_tensorflow_internal.so.

Standalone code to reproduce the issue

$ pip install tf-nightly-cpu
Collecting tf-nightly-cpu
  Downloading tf_nightly_cpu-2.12.0.dev20221024-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (225.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 225.0/225.0 MB 1.2 MB/s eta 0:00:00
# ...
$ pip install -v horovod
# ...
  Tensorflow_LIBRARIES := -L.../horovod-tf-nightly-venv/lib/python3.9/site-packages/tensorflow -l:libtensorflow_framework.so.2 .../horovod-tf-nightly-venv/lib/python3.9/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
  -- Found Tensorflow: -L.../horovod-tf-nightly-venv/lib/python3.9/site-packages/tensorflow -l:libtensorflow_framework.so.2 .../horovod-tf-nightly-venv/lib/python3.9/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so (found suitable version "2.12.0-dev20221024", minimum required is "1.15.0")
# ...
$ python -c 'import horovod.tensorflow'

Error message quoted below under "relevant log output".

Relevant log output

2022-10-24 16:12:49.274846: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File ".../horovod-tf-nightly-venv/lib/python3.9/site-packages/horovod/tensorflow/__init__.py", line 27, in <module>
    from horovod.tensorflow import elastic
  File ".../horovod-tf-nightly-venv/lib/python3.9/site-packages/horovod/tensorflow/elastic.py", line 24, in <module>
    from horovod.tensorflow.functions import broadcast_object, broadcast_object_fn, broadcast_variables
  File ".../horovod-tf-nightly-venv/lib/python3.9/site-packages/horovod/tensorflow/functions.py", line 24, in <module>
    from horovod.tensorflow.mpi_ops import allgather, broadcast, broadcast_
  File ".../horovod-tf-nightly-venv/lib/python3.9/site-packages/horovod/tensorflow/mpi_ops.py", line 53, in <module>
    raise e
  File ".../horovod-tf-nightly-venv/lib/python3.9/site-packages/horovod/tensorflow/mpi_ops.py", line 50, in <module>
    MPI_LIB = _load_library('mpi_lib' + get_ext_suffix())
  File ".../horovod-tf-nightly-venv/lib/python3.9/site-packages/horovod/tensorflow/mpi_ops.py", line 45, in _load_library
    library = load_library.load_op_library(filename)
  File ".../horovod-tf-nightly-venv/lib/python3.9/site-packages/tensorflow/python/framework/load_library.py", line 54, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: .../horovod-tf-nightly-venv/lib/python3.9/site-packages/horovod/tensorflow/mpi_lib.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZNK3xla14HloComputation23CollectUnreachableRootsEv
@bhack
Copy link
Contributor

bhack commented Oct 24, 2022

Can you check if it was related to #58287 ?

@maxhgerlach
Copy link
Contributor Author

maxhgerlach commented Oct 25, 2022

Thank you for the pointer, @bhack!

So #55941 probably was merged shortly before tf-nightly-cpu==2.12.0.dev2022101. That one moves implementations from _pywrap_tensorflow_internal.so to libtensorflow_cc.so and that's quite likely to have caused the issue with Horovod. So presumably we can fix the error in Horovod by linking to that newer library.

For now the PR has been reverted in 970c3b4, though. So the acute problem might just go away for a while once wheels for tf-nightly-cpu==2.12.0.dev20221025 are available.

Anyway, I will look into linking to libtensorflow_cc.so as well.

@maxhgerlach
Copy link
Contributor Author

Linking to libtensorflow_cc.so.2 works for me, horovod/horovod#3755

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants