-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
introduce --generate-ipex to (v1) python binary creation to lazy-load requirements #8793
introduce --generate-ipex to (v1) python binary creation to lazy-load requirements #8793
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
619e602
to
4cce241
Compare
660d253
to
87f7a32
Compare
filename_base, ext = tuple(os.path.splitext(self)) | ||
if ext == '.pex': | ||
raise ValueError("Could not hydrate ipex: {}.pex must not be named with the .pex extension.".format(filename_base)) | ||
hydrated_pex_file = '{}.pex'.format(filename_base) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd reconsider this. The hydrated pex is not a regular pex as the example showed. If you read through that test, in the last lines it nukes the local ~/.pex cache which destroys the functioning of the "hydrated" pex. That's why I chose .ipex as the suffix for the hydrated pex and .ptex for the suffix of the pex building robot that filled in the gaps (ski repair reference).
src/python/pants/backend/python/subsystems/pex_bootstrap/_ptex_launcher.py
Outdated
Show resolved
Hide resolved
**resolver_settings | ||
) | ||
|
||
for resolved_dist in resolved_distributions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be un-needed. If the resolution does not match that in ipex_info, the contract of the robot is broken. The resolve must be the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .ipex will fail at runtime claiming it can't satisfy some of its requirements for the current platform if I try to rely on using the same .distributions
from IPEX-INFO. What's working right now is to avoid adding any distributions when creating the .ptex, then adding the resolved distributions to ipex builder, after resolving them here. I can try to dig into why this is, but on the face of it, I don't see how we can successfully create a .ipex without adding the resolved distributions to it, unless we also set PEX_INHERIT_PATH='fallback'
or something as well.
src/python/pants/backend/python/subsystems/pex_bootstrap/_ptex_launcher.py
Outdated
Show resolved
Hide resolved
src/python/pants/backend/python/subsystems/pex_bootstrap/_ptex_launcher.py
Outdated
Show resolved
Hide resolved
src/python/pants/backend/python/subsystems/pex_bootstrap/_ptex_launcher.py
Outdated
Show resolved
Hide resolved
src/python/pants/backend/python/subsystems/pex_bootstrap/_ptex_launcher.py
Outdated
Show resolved
Hide resolved
src/python/pants/backend/python/subsystems/pex_bootstrap/_ptex_launcher.py
Outdated
Show resolved
Hide resolved
) When creating `.ipex` files as per pantsbuild#8793, we don't add resolved distributions to `self._distributions` in `PexBuilderWrapper`. This leads to the annoying `_sanitize_requirements()` hack introduced into `ipex_launcher.py` in that PR (having to remove duplicate `setuptools` requirements). - Add all resolved distributions to `self._distributions` in `PexBuilderWrapper`, even when `generate_ipex=True`. - Remove the separate list of transitively-resolved requirements passed through several methods in `PexBuilderWrapper`.
) When creating `.ipex` files as per pantsbuild#8793, we don't add resolved distributions to `self._distributions` in `PexBuilderWrapper`. This leads to the annoying `_sanitize_requirements()` hack introduced into `ipex_launcher.py` in that PR (having to remove duplicate `setuptools` requirements). - Add all resolved distributions to `self._distributions` in `PexBuilderWrapper`, even when `generate_ipex=True`. - Remove the separate list of transitively-resolved requirements passed through several methods in `PexBuilderWrapper`.
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches. @jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert pantsbuild#8787. **Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it. With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains: - in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from. - in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False. - in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed. For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size: ```bash X> ls dist total 145M -rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex* -rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex* ```
) When creating `.ipex` files as per pantsbuild#8793, we don't add resolved distributions to `self._distributions` in `PexBuilderWrapper`. This leads to the annoying `_sanitize_requirements()` hack introduced into `ipex_launcher.py` in that PR (having to remove duplicate `setuptools` requirements). - Add all resolved distributions to `self._distributions` in `PexBuilderWrapper`, even when `generate_ipex=True`. - Remove the separate list of transitively-resolved requirements passed through several methods in `PexBuilderWrapper`.
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches. @jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787. **Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it. With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains: - in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from. - in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False. - in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed. For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size: ```bash X> ls dist total 145M -rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex* -rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex* ```
When creating `.ipex` files as per #8793, we don't add resolved distributions to `self._distributions` in `PexBuilderWrapper`. This leads to the annoying `_sanitize_requirements()` hack introduced into `ipex_launcher.py` in that PR (having to remove duplicate `setuptools` requirements). - Add all resolved distributions to `self._distributions` in `PexBuilderWrapper`, even when `generate_ipex=True`. - Remove the separate list of transitively-resolved requirements passed through several methods in `PexBuilderWrapper`.
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches. @jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787. **Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it. With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains: - in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from. - in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False. - in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed. For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size: ```bash X> ls dist total 145M -rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex* -rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex* ```
When creating `.ipex` files as per #8793, we don't add resolved distributions to `self._distributions` in `PexBuilderWrapper`. This leads to the annoying `_sanitize_requirements()` hack introduced into `ipex_launcher.py` in that PR (having to remove duplicate `setuptools` requirements). - Add all resolved distributions to `self._distributions` in `PexBuilderWrapper`, even when `generate_ipex=True`. - Remove the separate list of transitively-resolved requirements passed through several methods in `PexBuilderWrapper`.
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches. @jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787. **Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it. With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains: - in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from. - in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False. - in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed. For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size: ```bash X> ls dist total 145M -rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex* -rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex* ```
### Problem A followup to #8793, after realizing that there are actually many separate patch versions of python interpreters going around. `git tag | wc -l` in the [cpython repo](https://github.com/python/cpython/) results in 421 separate tags, many just differing by the patch version. The terminology being used here is: `CPython==2.7.5` has a *major version* of 2, *minor version* of 7, and a *patch version* of 5. It is my impression that interpreter patch versions do not affect the libraries that can be used by the interpreter, just the major and minor versions (as in, pip will resolve the same libraries for Python `2.7.5` as `2.7.13`). When `.ipex` files are created (see #8793 for background), they need to specify a *precise* interpreter version, in order to ensure that the pip resolve that runs when the .ipex is first executed matches exactly the resolve that occurred at build time. However, specifying an interpreter constraint *including* the patch version doesn't affect this invariant, and *does* make .ipex files impossible to run on machines that don't have the exact right patch version of the specified interpreter! ### Solution - Make the `PEX-INFO` of a .ipex file point to a single interpreter constraint `{major}.{minor}.*`, which matches any patch version for the given major and minor versions. - `ipex_launcher.py` will use this interpreter directly to resolve 3rdparty requirements when the .ipex is first executed, so the interpreter constraint in `PEX-INFO` is how .ipex files ensure they have the correct interpreter to resolve for. ### Result .ipex files are easier to deploy!!! ### Possible Future Work In addition to multi-platform PEX files, could we also consider multi-interpreter PEX files? Alternatively (and this seems like a *fantastic* longer-term goal) we could follow up on #7369 and wrap the interpreter up with the PEX!
### Problem When creating `.ipex` files as per #8793, we don't add resolved distributions to `self._distributions` in `PexBuilderWrapper`. This leads to the annoying `_sanitize_requirements()` hack introduced into `ipex_launcher.py` in that PR (having to remove duplicate `setuptools` requirements). ### Solution - Add all resolved distributions to `self._distributions` in `PexBuilderWrapper`, even when `generate_ipex=True`. - Remove the separate list of transitively-resolved requirements passed through several methods in `PexBuilderWrapper`.
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches. @jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787. **Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it. With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains: - in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from. - in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False. - in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed. For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size: ```bash X> ls dist total 145M -rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex* -rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex* ```
A followup to #8793, after realizing that there are actually many separate patch versions of python interpreters going around. `git tag | wc -l` in the [cpython repo](https://github.com/python/cpython/) results in 421 separate tags, many just differing by the patch version. The terminology being used here is: `CPython==2.7.5` has a *major version* of 2, *minor version* of 7, and a *patch version* of 5. It is my impression that interpreter patch versions do not affect the libraries that can be used by the interpreter, just the major and minor versions (as in, pip will resolve the same libraries for Python `2.7.5` as `2.7.13`). When `.ipex` files are created (see #8793 for background), they need to specify a *precise* interpreter version, in order to ensure that the pip resolve that runs when the .ipex is first executed matches exactly the resolve that occurred at build time. However, specifying an interpreter constraint *including* the patch version doesn't affect this invariant, and *does* make .ipex files impossible to run on machines that don't have the exact right patch version of the specified interpreter! - Make the `PEX-INFO` of a .ipex file point to a single interpreter constraint `{major}.{minor}.*`, which matches any patch version for the given major and minor versions. - `ipex_launcher.py` will use this interpreter directly to resolve 3rdparty requirements when the .ipex is first executed, so the interpreter constraint in `PEX-INFO` is how .ipex files ensure they have the correct interpreter to resolve for. .ipex files are easier to deploy!!! In addition to multi-platform PEX files, could we also consider multi-interpreter PEX files? Alternatively (and this seems like a *fantastic* longer-term goal) we could follow up on #7369 and wrap the interpreter up with the PEX!
When creating `.ipex` files as per #8793, we don't add resolved distributions to `self._distributions` in `PexBuilderWrapper`. This leads to the annoying `_sanitize_requirements()` hack introduced into `ipex_launcher.py` in that PR (having to remove duplicate `setuptools` requirements). - Add all resolved distributions to `self._distributions` in `PexBuilderWrapper`, even when `generate_ipex=True`. - Remove the separate list of transitively-resolved requirements passed through several methods in `PexBuilderWrapper`.
…SKIP_EXECUTION env var (#10530) ### Problem The "ipex" execution method for pex files from #8793 wastes a lot of time zipping up the 3rdparty modules it just hydrated when it is first executed. This is unnecessary, as pex files can be executed as unzipped directories. ### Solution - Call `.freeze()` instead of `.build()` in `ipex_launcher.py`. - Add the capability to skip actually executing the pex after hydration if `PEX_IPEX_SKIP_EXECUTION` is set in the subprocess environment. ### Result On `1.25.x-twtr`: ```bash > ./pants binary.py --generate-ipex examples/src/python/example/tensorflow_custom_op:show-tf-version && \ time PEX_PROFILE_FILENAME=ipex.prof ./dist/show-tf-version.ipex ... 41.51s user 12.65s system 86% cpu 1:02.38 total ``` On this PR: ```bash > ./pants binary.py --generate-ipex examples/src/python/example/tensorflow_custom_op:show-tf-version && \ time PEX_PROFILE_FILENAME=ipex.prof ./dist/show-tf-version.ipex ... 16.29s user 8.48s system 77% cpu 32.000 total ```
Problem
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.
@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex
2.0.3
will help to optimize several other aspects of this process when we can unrevert #8787.Note:
src/python/pants/backend/python/subsystems/pex_build_util.py
was removed in this PR, along with all floating references to it.Solution
With
--binary-py-generate-ipex
, a.ipex
file will be created when./pants binary
is run against apython_binary()
target. This.ipex
archive will create a.pex
file and run it when first executed. The.ipex
archive contains:IPEX-INFO
: the source files to inject into the resulting.pex
, and pypi indices to resolve requirements from.BOOSTRAP-PEX-INFO
: thePEX-INFO
of the pex file that would have been generated if--generate-ipex
was False.ipex.py
: A bootstrap script which will generate a.pex
file when the.ipex
file is first executed.Result
For a
.ipex
file which hydrates thetensorflow==1.14.0
dependency when it is first run, this translates to a >100x decrease in file size: