Dataflow workers not able to install tfx from requirements file due to `no-binary` option from beam stager #649

andrewsmartin · 2019-09-19T20:12:28Z

When no Beam packaging arguments are provided by the user, TFX generates a requirements file with the tfx package inside.

This ends up failing on Dataflow, because the Beam stager uses pip's --no-binary flag: https://github.com/apache/beam/blob/v2.15.0/sdks/python/apache_beam/runners/portability/stager.py#L483.

Indeed, in a fresh virtualenv (Python 3.6.3):

pip download tfx==0.14.0 --no-binary :all:
Collecting tfx==0.14.0
  ERROR: Could not find a version that satisfies the requirement tfx==0.14.0 (from versions: none)
ERROR: No matching distribution found for tfx==0.14.0

Whereas if I remove the --no-binary flag, it works just fine.

I'm not all that knowledgable about Python packaging, but is this because TFX is built as a wheel? Is there some Beam option I can pass to make this work?

The text was updated successfully, but these errors were encountered:

charlesccychen · 2019-09-19T21:50:12Z

Thanks @andrewsmartin. I think we can consider this to mainly be a bug in Beam. We currently do not upload the source package to PyPI and it is currently not trivial to set up the correct environment to build the package from source.

@angoenka: is there a particular reason we use --no-binary at the line here (https://github.com/apache/beam/blob/v2.15.0/sdks/python/apache_beam/runners/portability/stager.py#L483)? Should we remove this?

As a workaround, you can try downloading the wheel file from PyPI (https://pypi.org/project/tfx/#files) and specify it as an --extra_package to Beam.

CC: @zhitaoli

andrewsmartin · 2019-09-20T14:06:43Z

Hi @charlesccychen, thanks for the response! Agreed that this seems more of an issue in Beam itself. I raised here just because the default behaviour in TFX does not work.

I was able to work around this by providing a minimal setup file with install_requires=["tfx==0.14.0"], so this isn't a blocker or anything. It would just be nice to be able to use a requirements file, and let TFX take care of it.

angoenka · 2019-09-23T19:05:55Z

The corresponding issue similar to this is tracked at https://jira.apache.org/jira/browse/BEAM-4032
The reason is that binary packages are environment dependent. The packages are downloaded on the client machines and then shipped to the worker machines and hence might not be compatible with the worker machine.

We can look into it but at the moment its not prioritized.

tejaslodaya · 2019-09-28T19:38:42Z

Hi @andrewsmartin and @charlesccychen

I was not able to use both of your workarounds. I am still trying to play around with Chicago Beam pipeline - on spark.

I tried using --extra_package argument and point it to wheel file like this-

additional_pipeline_args={
'beam_pipeline_args': [
    '--runner=PortableRunner',
     '--extra_package=/Users/tejas.lodaya/Downloads/tfx-0.14.0-py3-none-any.whl',
      ....
      ....

I got this error:
Output from execution of subprocess: b'Collecting tfx==0.14.0 ERROR: Could not find a version that satisfies the requirement tfx==0.14.0 (from versions: none)\nERROR: No matching distribution found for tfx==0.14.0

I feel that the wheel file was ignored.

I then used the second suggestion, with

additional_pipeline_args={
   'beam_pipeline_args': [
      '--runner=PortableRunner',
      '--setup_file=/Users/tejas.lodaya/setup.py',
       ....
       ....

and setup.py contains-

install_requires=["tfx==0.14.0"]

It thows below error
'File %s not found.' % os.path.join(temp_dir, '*.tar.gz'))

Please help me with the correct way of inserting tfx package on beam executors

andrewsmartin · 2019-10-02T00:28:20Z

Hi @tejaslodaya, for the second case (trying with a provided setup.py file), do you have a more detailed stacktrace? Can you also share the full contents of your setup.py?

tejaslodaya · 2019-10-16T18:20:44Z

Hi @andrewsmartin and @charlesccychen

I managed to solve this issue by doing these steps:

Go to site-packages inside your virtual environment and go to apache_beam/runners/portability/stager.py file.
Go to _populate_requirements_cache function and remove these two lines
'--no-binary',
':all:'
Reload the package inside your jupyter notebook/ main call.

In my case, I had created conda environment and changed this file: ~/miniconda3/envs/tfx_test/lib/python3.7/site-packages/apache_beam/runners/portability/stager.py where my environment name is tfx_test.

This solves the issue.

tejaslodaya · 2019-10-16T18:21:05Z

@andrewsmartin please close this issue

andrewsmartin · 2019-10-17T14:28:32Z

Hi @tejaslodaya - glad you found a workaround, but it is just that - a workaround. That said I'm going to keep this open.

yantriks-edi-bice · 2020-03-25T20:50:09Z

@andrewsmartin I believe I ran into this issue running a TFX pipeline on Kubeflow (based on the Taxi template). Could not tell exactly what was happening from what logs I could find - for one thing I could not find this worker-startup log.
"A setup error was detected in beamapp-root-0325160453-2-03250905-ygwy-harness-n7qk. Please refer to the worker-startup log for detailed information."

Yes, eventually found the worker-startup log in Stackdriver by filtering logs (there's a googleapis worker-startup choice). The error was "Failed to install packages: failed to install workflow".

Job still failed but I made it a lot further once I applied the change @tejaslodaya described above.

Ran into this error when running a similar pipeline this time from my mac. Not sure if commenting out the no-binary option is appropriate in this case given the differences between my laptop and dataflow workers.

ucdmkt · 2020-04-01T08:12:57Z

I hit the same issue when trying to run DataflowRunner from locally running BeamDagRunner.

(error from BigQueryExampleGen on DataflowRunner)

  File "/usr/local/google/home/muchida/miniconda3/envs/tfx-kfp-2/lib/python3.7/site-packages/apache_beam/utils/processes.py", line 83, in check_output
    out = subprocess.check_output(*args, **kwargs)
  File "/usr/local/google/home/muchida/miniconda3/envs/tfx-kfp-2/lib/python3.7/subprocess.py", line 411, in check_output
    **kwargs).stdout
  File "/usr/local/google/home/muchida/miniconda3/envs/tfx-kfp-2/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/usr/local/google/home/muchida/miniconda3/envs/tfx-kfp-2/bin/python', '-m', 'pip', 'download', '--dest', '/tmp/dataflow-requirements-cache', '-r', '/tmp/tmp6s75wqpi/requirement.txt', '--exists-action', 'i', '--no-binary', ':all:']' returned non-zero exit status 1.

$ pip list | grep -P 'beam|tensorflow|tfx'
apache-beam                2.17.0             
tensorflow                 2.1.0              
tensorflow-data-validation 0.21.5             
tensorflow-estimator       2.1.0              
tensorflow-metadata        0.21.1             
tensorflow-model-analysis  0.21.6             
tensorflow-serving-api     2.1.0              
tensorflow-transform       0.21.2             
tfx                        0.21.2             
tfx-bsl                    0.21.4

Ark-kun · 2020-04-28T20:00:08Z

I'm hitting the same issues with tfx==0.21.2 and tfx==0.21.4.

Ark-kun · 2020-04-28T20:55:16Z

Here is the log I'm getting:

  File "/tfx-src/tfx/components/example_gen/base_example_gen_executor.py", line 235, in Do
    artifact_utils.get_split_uri(output_dict['examples'], split_name)))
  File "/opt/venv/lib/python3.6/site-packages/apache_beam/pipeline.py", line 426, in __exit__
    self.run().wait_until_finish()
  File "/opt/venv/lib/python3.6/site-packages/apache_beam/pipeline.py", line 406, in run
    self._options).run(False)
  File "/opt/venv/lib/python3.6/site-packages/apache_beam/pipeline.py", line 419, in run
    return self.runner.run_pipeline(self, self._options)
  File "/opt/venv/lib/python3.6/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 488, in run_pipeline
    self.dataflow_client.create_job(self.job), self)
  File "/opt/venv/lib/python3.6/site-packages/apache_beam/utils/retry.py", line 206, in wrapper
    return fun(*args, **kwargs)
  File "/opt/venv/lib/python3.6/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 530, in create_job
    self.create_job_description(job)
  File "/opt/venv/lib/python3.6/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 560, in create_job_description
    resources = self._stage_resources(job.options)
  File "/opt/venv/lib/python3.6/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 490, in _stage_resources
    staging_location=google_cloud_options.staging_location)
  File "/opt/venv/lib/python3.6/site-packages/apache_beam/runners/portability/stager.py", line 168, in stage_job_resources
    requirements_cache_path)
  File "/opt/venv/lib/python3.6/site-packages/apache_beam/utils/retry.py", line 206, in wrapper
    return fun(*args, **kwargs)
  File "/opt/venv/lib/python3.6/site-packages/apache_beam/runners/portability/stager.py", line 487, in _populate_requirements_cache
    processes.check_output(cmd_args, stderr=processes.STDOUT)
  File "/opt/venv/lib/python3.6/site-packages/apache_beam/utils/processes.py", line 91, in check_output
    .format(traceback.format_exc(), args[0][6], error.output))
RuntimeError: Full traceback: Traceback (most recent call last):
  File "/opt/venv/lib/python3.6/site-packages/apache_beam/utils/processes.py", line 83, in check_output
    out = subprocess.check_output(*args, **kwargs)
  File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/usr/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/opt/venv/bin/python3', '-m', 'pip', 'download', '--dest', '/tmp/dataflow-requirements-cache', '-r', '/tmp/tmpogyhgwkv/requirement.txt', '--exists-action', 'i', '--no-binary', ':all:']' returned non-zero exit status 1.
 
 Pip install failed for package: -r         
 Output from execution of subprocess: b"ERROR: Could not find a version that satisfies the requirement tfx==0.21.4 (from -r /tmp/tmpogyhgwkv/requirement.txt (line 1)) (from versions: none)\nERROR: No matching distribution found for tfx==0.21.4 (from -r /tmp/tmpogyhgwkv/requirement.txt (line 1))\nWARNING: You are using pip version 20.0.2; however, version 20.1 is available.\nYou should consider upgrading via the '/opt/venv/bin/python3 -m pip install --upgrade pip' command.\n"

gfournier · 2020-10-25T21:08:24Z

I have the same issue here with onnxruntime==1.4.0 and tensorflow==2.3.1.
Is there a way to bypass dependency installation from binaries and taking wheels instead ?

Runner: Dataflow

Error message:

ERROR: Could not find a version that satisfies the requirement tensorflow==2.3.1 (from -r /app/requirements-dataflow.txt (line 12)) (from versions: none)
ERROR: No matching distribution found for tensorflow==2.3.1 (from -r /app/requirements-dataflow.txt (line 12))

pindinagesh · 2021-09-30T14:18:16Z

@andrewsmartin

Could you please confirm is this still an issue, otherwise move this to closed status. Thanks

andrewsmartin · 2021-10-20T20:18:01Z

This is no longer an issue for us but only because we are using a different workaround. Unfortunately I cannot confirm whether it is in fact still an issue. That said, it my be less relevant going forward as there is now better support for custom containers on Dataflow workers. I'd be OK with this being closed personally but other users still may be hitting this.

pindinagesh · 2021-11-12T08:40:30Z

@andrewsmartin

Closing this issue, Please feel free to reopen if this still exist. Thanks

google-ml-butler · 2021-11-12T08:40:32Z

Are you satisfied with the resolution of your issue?
Yes
No

davidcavazos · 2022-04-13T03:01:06Z

This still happens with tensorflow 2.8.0. I have tensorflow as a requirement and I still get this error. Is there any reason to keep the --no-binary option?

I've had many issues with that, including that it takes a really long time to recompile every single direct and indirect requirement. I've had timeouts at startup on Flex Templates as well because most Google client libraries depend on pyarrow which takes a very long time to compile from source.

I think removing --no-binary and using the pre-compiled packages would be both faster, less wasteful on resources, and would get rid of these kinds of errors.

andrewsmartin mentioned this issue Sep 19, 2019

ImportError: libarrow.so.14: cannot open shared object file: No such file or directory when running on Dataflow tensorflow/data-validation#83

Closed

rmothukuru self-assigned this Sep 19, 2019

rmothukuru added the type:build/install label Sep 19, 2019

rmothukuru assigned charlesccychen and unassigned rmothukuru Sep 19, 2019

rmothukuru added the stat:awaiting tensorflower label Sep 19, 2019

1025KB assigned angoenka Sep 30, 2019

gowthamkpr mentioned this issue Oct 1, 2019

tfx apache beam with spark - setup_beam_on_spark - not running #691

Closed

tejaslodaya mentioned this issue Nov 8, 2019

Beam not able to comminicate with spark master #772

Closed

Ark-kun mentioned this issue Apr 28, 2020

Consider adding support for running TFX w/Beam in the components kubeflow/pipelines#2958

Closed

pindinagesh added the stat:awaiting response label Sep 30, 2021

UsharaniPagadala removed the stat:awaiting response label Sep 30, 2021

pindinagesh self-assigned this Oct 28, 2021

pindinagesh closed this as completed Nov 12, 2021

pindinagesh removed their assignment Nov 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataflow workers not able to install tfx from requirements file due to `no-binary` option from beam stager #649

Dataflow workers not able to install tfx from requirements file due to `no-binary` option from beam stager #649

andrewsmartin commented Sep 19, 2019

charlesccychen commented Sep 19, 2019

andrewsmartin commented Sep 20, 2019

angoenka commented Sep 23, 2019

tejaslodaya commented Sep 28, 2019 •

edited

Loading

andrewsmartin commented Oct 2, 2019

tejaslodaya commented Oct 16, 2019

tejaslodaya commented Oct 16, 2019

andrewsmartin commented Oct 17, 2019

yantriks-edi-bice commented Mar 25, 2020 •

edited

Loading

ucdmkt commented Apr 1, 2020

Ark-kun commented Apr 28, 2020

Ark-kun commented Apr 28, 2020

gfournier commented Oct 25, 2020 •

edited

Loading

pindinagesh commented Sep 30, 2021

andrewsmartin commented Oct 20, 2021 •

edited

Loading

pindinagesh commented Nov 12, 2021

google-ml-butler bot commented Nov 12, 2021

davidcavazos commented Apr 13, 2022

Dataflow workers not able to install tfx from requirements file due to no-binary option from beam stager #649

Dataflow workers not able to install tfx from requirements file due to no-binary option from beam stager #649

Comments

andrewsmartin commented Sep 19, 2019

charlesccychen commented Sep 19, 2019

andrewsmartin commented Sep 20, 2019

angoenka commented Sep 23, 2019

tejaslodaya commented Sep 28, 2019 • edited Loading

andrewsmartin commented Oct 2, 2019

tejaslodaya commented Oct 16, 2019

tejaslodaya commented Oct 16, 2019

andrewsmartin commented Oct 17, 2019

yantriks-edi-bice commented Mar 25, 2020 • edited Loading

ucdmkt commented Apr 1, 2020

Ark-kun commented Apr 28, 2020

Ark-kun commented Apr 28, 2020

gfournier commented Oct 25, 2020 • edited Loading

pindinagesh commented Sep 30, 2021

andrewsmartin commented Oct 20, 2021 • edited Loading

pindinagesh commented Nov 12, 2021

google-ml-butler bot commented Nov 12, 2021

davidcavazos commented Apr 13, 2022

Dataflow workers not able to install tfx from requirements file due to `no-binary` option from beam stager #649

Dataflow workers not able to install tfx from requirements file due to `no-binary` option from beam stager #649

tejaslodaya commented Sep 28, 2019 •

edited

Loading

yantriks-edi-bice commented Mar 25, 2020 •

edited

Loading

gfournier commented Oct 25, 2020 •

edited

Loading

andrewsmartin commented Oct 20, 2021 •

edited

Loading