Skip to content

ImportError: libarrow.so.14: cannot open shared object file: No such file or directory when running on Dataflow #83

Closed
@andrewsmartin

Description

@andrewsmartin

Hi,

We are trying to upgrade to TFX 0.14.0 for our pipelines, but when running Statistics Gen on Dataflow, we frequently run into this error:

 File "/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py", line 258, in loads
    return dill.loads(s)
  File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 317, in loads
    return load(file, ignore)
  File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 305, in load
    obj = pik.load()
  File "/usr/local/lib/python3.6/site-packages/dill/_dill.py", line 474, in find_class
    return StockUnpickler.find_class(self, module, name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow_data_validation/statistics/stats_impl.py", line 29, in <module>
    from tensorflow_data_validation import constants
  File "/usr/local/lib/python3.6/site-packages/tensorflow_data_validation/__init__.py", line 27, in <module>
    from tensorflow_data_validation.api.validation_api import infer_schema
  File "/usr/local/lib/python3.6/site-packages/tensorflow_data_validation/api/validation_api.py", line 29, in <module>
    from tensorflow_data_validation.pywrap import pywrap_tensorflow_data_validation
  File "/usr/local/lib/python3.6/site-packages/tensorflow_data_validation/pywrap/pywrap_tensorflow_data_validation.py", line 28, in <module>
    _pywrap_tensorflow_data_validation = swig_import_helper()
  File "/usr/local/lib/python3.6/site-packages/tensorflow_data_validation/pywrap/pywrap_tensorflow_data_validation.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_data_validation', fp, pathname, description)
  File "/usr/local/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/local/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libarrow.so.14: cannot open shared object file: No such file or directory

We are using Beam 2.14 (as per TFX compatibility matrix) and Python 3.6. It's very strange because this error doesn't happen all the time, which makes me think it could an environment issue on some of the dataflow workers, but I'm not entirely sure.

We install tfx==0.14.0 on the beam workers by providing a custom setup file.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions