You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're using pex to package PySpark applications with dependencies and deploy them to Google Dataproc. The target environment already has the required version of PySpark, so we don't want to include the pyspark package in the resulting pex file, especially that it's ~280MB in size.
Normally this works well, as long as we don't include pyspark in the requirements.txt file used to build the pex file, and also no other packages has pyspark as a transitive dependency. However, now we are starting to use packages that require pyspark, so it started being resolved and included in the resulting pex file.
I tried to use pip-compile with --unsafe-package pyspark to pre-resolve all dependencies and exclude pyspark, and then use the output requirements.txt with pex --no-transitive flag:
Failed to find compatible interpreter on path /usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:..
Examined the following interpreters:
1.) /opt/conda/miniconda3/bin/python3.8 CPython==3.8.13
2.) /usr/bin/python3.7 CPython==3.7.3
No interpreter compatible with the requested constraints was found:
Failed to resolve requirements from PEX environment @ /tmp/tmpdjsmz0uk/unzipped_pexes/7be55d3b97db32a3c2d47821bddfa887a73d2a08.
Needed cp38-cp38-manylinux_2_28_x86_64 compatible dependencies for:
1: pyspark<4.0,>=3.0
Required by:
sparkql 0.5.2
But this pex had no ProjectName(raw='pyspark', normalized='pyspark') distributions.
Failed to resolve requirements from PEX environment @ /tmp/tmpdjsmz0uk/unzipped_pexes/7be55d3b97db32a3c2d47821bddfa887a73d2a08.
Needed cp37-cp37m-manylinux_2_28_x86_64 compatible dependencies for:
1: pyspark<4.0,>=3.0
Required by:
sparkql 0.5.2
But this pex had no ProjectName(raw='pyspark', normalized='pyspark') distributions.
How do I tell pex to rely on the package installed on the target environment instead of requiring it to be embedded in the pex file?
The text was updated successfully, but these errors were encountered:
Great. @khaledh this does make me realize Pex now has the infra all these years later to support excludes 1st class. The initial resolve phase will still have to pull down the transitive set of distributions - this is delegated to Pip - but the PEX assembly phase could exclude a given set of roots (and their transitive deps) or a given set of roots intransitively. If that would be a useful feature, please file a separate feature request issue.
Related issues: #219, #737
We're using pex to package PySpark applications with dependencies and deploy them to Google Dataproc. The target environment already has the required version of PySpark, so we don't want to include the
pyspark
package in the resulting pex file, especially that it's ~280MB in size.Normally this works well, as long as we don't include
pyspark
in the requirements.txt file used to build the pex file, and also no other packages has pyspark as a transitive dependency. However, now we are starting to use packages that require pyspark, so it started being resolved and included in the resulting pex file.I tried to use
pip-compile
with--unsafe-package pyspark
to pre-resolve all dependencies and exclude pyspark, and then use the output requirements.txt withpex --no-transitive
flag:But the resulting pex gives the following error:
How do I tell pex to rely on the package installed on the target environment instead of requiring it to be embedded in the pex file?
The text was updated successfully, but these errors were encountered: