Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImportError: libarrow.so.14: cannot open shared object file: No such file or directory when running on Dataflow #83

Closed
andrewsmartin opened this issue Sep 18, 2019 · 5 comments

Comments

@andrewsmartin
Copy link

Hi,

We are trying to upgrade to TFX 0.14.0 for our pipelines, but when running Statistics Gen on Dataflow, we frequently run into this error:

 File "/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py", line 258, in loads
    return dill.loads(s)
  File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 317, in loads
    return load(file, ignore)
  File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 305, in load
    obj = pik.load()
  File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 474, in find_class
    return StockUnpickler.find_class(self, module, name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow_data_validation/statistics/stats_impl.py", line 29, in <module>
    from tensorflow_data_validation import constants
  File "/usr/local/lib/python3.6/site-packages/tensorflow_data_validation/__init__.py", line 27, in <module>
    from tensorflow_data_validation.api.validation_api import infer_schema
  File "/usr/local/lib/python3.6/site-packages/tensorflow_data_validation/api/validation_api.py", line 29, in <module>
    from tensorflow_data_validation.pywrap import pywrap_tensorflow_data_validation
  File "/usr/local/lib/python3.6/site-packages/tensorflow_data_validation/pywrap/pywrap_tensorflow_data_validation.py", line 28, in <module>
    _pywrap_tensorflow_data_validation = swig_import_helper()
  File "/usr/local/lib/python3.6/site-packages/tensorflow_data_validation/pywrap/pywrap_tensorflow_data_validation.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_data_validation', fp, pathname, description)
  File "/usr/local/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/local/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libarrow.so.14: cannot open shared object file: No such file or directory

We are using Beam 2.14 (as per TFX compatibility matrix) and Python 3.6. It's very strange because this error doesn't happen all the time, which makes me think it could an environment issue on some of the dataflow workers, but I'm not entirely sure.

We install tfx==0.14.0 on the beam workers by providing a custom setup file.

@andrewsmartin andrewsmartin changed the title ImportError: libarrow.so.14: cannot open shared object file: No such file or directory when running on Datalow ImportError: libarrow.so.14: cannot open shared object file: No such file or directory when running on Dataflow Sep 18, 2019
@rmothukuru rmothukuru self-assigned this Sep 19, 2019
@andrewsmartin
Copy link
Author

This also happens when trying to run TF Transform (since that pulls in TFDV). I experienced this again this morning. The first time I ran it, the DF job failed with this error, and without touching any code and re-running again, it was able to completely finish.

@rmothukuru
Copy link

rmothukuru commented Sep 19, 2019

@andrewsmartin ,
As this might be an environment issue, can you please provide details about what platform you are using (operating system, architecture). Also include versions of TensorFlow, TFDV and TF Transform .

Make sure you also include the exact command if possible to produce the output included in your test case. If you are unclear what to include see the issue template displayed in the Github new issue template.

We ask for this in the issue submission template, because it is really difficult to help without that information. Thanks!

@andrewsmartin
Copy link
Author

andrewsmartin commented Sep 19, 2019

Hi @rmothukuru, thanks for the reply.

Here are the library versions I am using:

Tensorflow: 1.14.0
TFDV: 0.14.1
TFT: 0.14.0

This might be more of an issue for https://github.com/tensorflow/tfx, since I am submitting the job to Dataflow through their orchestration layer (https://github.com/tensorflow/tfx/blob/master/tfx/components/base/base_executor.py#L80).

The custom setup file I'm using for beam looks like this:

from setuptools import setup


VERSION = "0.14.0"

setup(
    name="custom_tfx",
    version=VERSION,
    install_requires=["tfx=={version}".format(version=VERSION)],
)

I am doing this in order to work around tensorflow/tfx#649.

I will try to follow up shortly with a way to reproduce using the example taxi dataset, I understand it's hard to diagnose without that.

@andrewsmartin
Copy link
Author

If it makes more sense to discuss this over at https://github.com/tensorflow/tfx, I'm happy to close this!

@rmothukuru
Copy link

@andrewsmartin ,
Thank you for the information. Since you have already raised the issue in TFX, I'm closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants