Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excluding a transitive package that is available on the target environment #2082

Closed
khaledh opened this issue Mar 6, 2023 · 3 comments
Closed
Assignees

Comments

@khaledh
Copy link

khaledh commented Mar 6, 2023

Related issues: #219, #737

We're using pex to package PySpark applications with dependencies and deploy them to Google Dataproc. The target environment already has the required version of PySpark, so we don't want to include the pyspark package in the resulting pex file, especially that it's ~280MB in size.

Normally this works well, as long as we don't include pyspark in the requirements.txt file used to build the pex file, and also no other packages has pyspark as a transitive dependency. However, now we are starting to use packages that require pyspark, so it started being resolved and included in the resulting pex file.

I tried to use pip-compile with --unsafe-package pyspark to pre-resolve all dependencies and exclude pyspark, and then use the output requirements.txt with pex --no-transitive flag:

$ cat requirements.txt
sparkql==0.5.2
$ pex --inherit-path=fallback --no-transitive -r requirements.txt -o app.pex

But the resulting pex gives the following error:

Failed to find compatible interpreter on path /usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:..

Examined the following interpreters:
1.) /opt/conda/miniconda3/bin/python3.8 CPython==3.8.13
2.)                  /usr/bin/python3.7 CPython==3.7.3

No interpreter compatible with the requested constraints was found:

  Failed to resolve requirements from PEX environment @ /tmp/tmpdjsmz0uk/unzipped_pexes/7be55d3b97db32a3c2d47821bddfa887a73d2a08.
  Needed cp38-cp38-manylinux_2_28_x86_64 compatible dependencies for:
   1: pyspark<4.0,>=3.0
      Required by:
        sparkql 0.5.2
      But this pex had no ProjectName(raw='pyspark', normalized='pyspark') distributions.

  Failed to resolve requirements from PEX environment @ /tmp/tmpdjsmz0uk/unzipped_pexes/7be55d3b97db32a3c2d47821bddfa887a73d2a08.
  Needed cp37-cp37m-manylinux_2_28_x86_64 compatible dependencies for:
   1: pyspark<4.0,>=3.0
      Required by:
        sparkql 0.5.2
      But this pex had no ProjectName(raw='pyspark', normalized='pyspark') distributions.

How do I tell pex to rely on the package installed on the target environment instead of requiring it to be embedded in the pex file?

@jsirois
Copy link
Member

jsirois commented Mar 6, 2023

@khaledh did you try either --ignore-errors (build time: https://pex.readthedocs.io/en/v2.1.126/buildingpex.html#ignore-errors) or PEX_IGNORE_ERRORS=1 (runtime: https://pex.readthedocs.io/en/v2.1.126/api/vars.html#PEX_IGNORE_ERRORS)?

@khaledh
Copy link
Author

khaledh commented Mar 6, 2023

@jsirois That did the trick! Thanks :)

@jsirois
Copy link
Member

jsirois commented Mar 6, 2023

Great. @khaledh this does make me realize Pex now has the infra all these years later to support excludes 1st class. The initial resolve phase will still have to pull down the transitive set of distributions - this is delegated to Pip - but the PEX assembly phase could exclude a given set of roots (and their transitive deps) or a given set of roots intransitively. If that would be a useful feature, please file a separate feature request issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants