Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce --generate-ipex to (v1) python binary creation to lazy-load requirements #8793

Merged
merged 1 commit into from
Mar 12, 2020

Conversation

cosmicexplorer
Copy link
Contributor

@cosmicexplorer cosmicexplorer commented Dec 11, 2019

Problem

See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex 2.0.3 will help to optimize several other aspects of this process when we can unrevert #8787.

Note: src/python/pants/backend/python/subsystems/pex_build_util.py was removed in this PR, along with all floating references to it.

Solution

With --binary-py-generate-ipex, a .ipex file will be created when ./pants binary is run against a python_binary() target. This .ipex archive will create a .pex file and run it when first executed. The .ipex archive contains:

  • in IPEX-INFO: the source files to inject into the resulting .pex, and pypi indices to resolve requirements from.
  • in BOOSTRAP-PEX-INFO: the PEX-INFO of the pex file that would have been generated if --generate-ipex was False.
  • in ipex.py: A bootstrap script which will generate a .pex file when the .ipex file is first executed.

Result

For a .ipex file which hydrates the tensorflow==1.14.0 dependency when it is first run, this translates to a >100x decrease in file size:

X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

@cosmicexplorer cosmicexplorer force-pushed the ipex-robot-impl branch 2 times, most recently from 660d253 to 87f7a32 Compare December 12, 2019 10:24
filename_base, ext = tuple(os.path.splitext(self))
if ext == '.pex':
raise ValueError("Could not hydrate ipex: {}.pex must not be named with the .pex extension.".format(filename_base))
hydrated_pex_file = '{}.pex'.format(filename_base)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd reconsider this. The hydrated pex is not a regular pex as the example showed. If you read through that test, in the last lines it nukes the local ~/.pex cache which destroys the functioning of the "hydrated" pex. That's why I chose .ipex as the suffix for the hydrated pex and .ptex for the suffix of the pex building robot that filled in the gaps (ski repair reference).

@cosmicexplorer cosmicexplorer changed the title introduce --generate-ipex to python binary creation to lazy-load requirements introduce --generate-ptex to python binary creation to lazy-load requirements Dec 12, 2019
**resolver_settings
)

for resolved_dist in resolved_distributions:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be un-needed. If the resolution does not match that in ipex_info, the contract of the robot is broken. The resolve must be the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .ipex will fail at runtime claiming it can't satisfy some of its requirements for the current platform if I try to rely on using the same .distributions from IPEX-INFO. What's working right now is to avoid adding any distributions when creating the .ptex, then adding the resolved distributions to ipex builder, after resolving them here. I can try to dig into why this is, but on the face of it, I don't see how we can successfully create a .ipex without adding the resolved distributions to it, unless we also set PEX_INHERIT_PATH='fallback' or something as well.

cosmicexplorer added a commit to cosmicexplorer/pants that referenced this pull request Mar 31, 2020
)

When creating `.ipex` files as per pantsbuild#8793, we don't add resolved distributions to `self._distributions` in `PexBuilderWrapper`. This leads to the annoying `_sanitize_requirements()` hack introduced into `ipex_launcher.py` in that PR (having to remove duplicate `setuptools` requirements).

- Add all resolved distributions to `self._distributions` in `PexBuilderWrapper`, even when `generate_ipex=True`.
- Remove the separate list of transitively-resolved requirements passed through several methods in `PexBuilderWrapper`.
cosmicexplorer added a commit to cosmicexplorer/pants that referenced this pull request Mar 31, 2020
)

When creating `.ipex` files as per pantsbuild#8793, we don't add resolved distributions to `self._distributions` in `PexBuilderWrapper`. This leads to the annoying `_sanitize_requirements()` hack introduced into `ipex_launcher.py` in that PR (having to remove duplicate `setuptools` requirements).

- Add all resolved distributions to `self._distributions` in `PexBuilderWrapper`, even when `generate_ipex=True`.
- Remove the separate list of transitively-resolved requirements passed through several methods in `PexBuilderWrapper`.
cosmicexplorer added a commit to cosmicexplorer/pants that referenced this pull request Mar 31, 2020
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert pantsbuild#8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
cosmicexplorer added a commit to cosmicexplorer/pants that referenced this pull request Mar 31, 2020
)

When creating `.ipex` files as per pantsbuild#8793, we don't add resolved distributions to `self._distributions` in `PexBuilderWrapper`. This leads to the annoying `_sanitize_requirements()` hack introduced into `ipex_launcher.py` in that PR (having to remove duplicate `setuptools` requirements).

- Add all resolved distributions to `self._distributions` in `PexBuilderWrapper`, even when `generate_ipex=True`.
- Remove the separate list of transitively-resolved requirements passed through several methods in `PexBuilderWrapper`.
stuhood pushed a commit that referenced this pull request Mar 31, 2020
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
stuhood pushed a commit that referenced this pull request Mar 31, 2020
When creating `.ipex` files as per #8793, we don't add resolved distributions to `self._distributions` in `PexBuilderWrapper`. This leads to the annoying `_sanitize_requirements()` hack introduced into `ipex_launcher.py` in that PR (having to remove duplicate `setuptools` requirements).

- Add all resolved distributions to `self._distributions` in `PexBuilderWrapper`, even when `generate_ipex=True`.
- Remove the separate list of transitively-resolved requirements passed through several methods in `PexBuilderWrapper`.
stuhood pushed a commit that referenced this pull request Mar 31, 2020
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
stuhood pushed a commit that referenced this pull request Mar 31, 2020
When creating `.ipex` files as per #8793, we don't add resolved distributions to `self._distributions` in `PexBuilderWrapper`. This leads to the annoying `_sanitize_requirements()` hack introduced into `ipex_launcher.py` in that PR (having to remove duplicate `setuptools` requirements).

- Add all resolved distributions to `self._distributions` in `PexBuilderWrapper`, even when `generate_ipex=True`.
- Remove the separate list of transitively-resolved requirements passed through several methods in `PexBuilderWrapper`.
@stuhood stuhood modified the milestones: 1.25.x, 1.26.x Apr 1, 2020
stuhood pushed a commit that referenced this pull request Apr 1, 2020
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
stuhood pushed a commit that referenced this pull request Apr 1, 2020
### Problem

A followup to #8793, after realizing that there are actually many separate patch versions of python interpreters going around. `git tag | wc -l` in the [cpython repo](https://github.com/python/cpython/) results in 421 separate tags, many just differing by the patch version.

The terminology being used here is: `CPython==2.7.5` has a *major version* of 2, *minor version* of 7, and a *patch version* of 5.

It is my impression that interpreter patch versions do not affect the libraries that can be used by the interpreter, just the major and minor versions (as in, pip will resolve the same libraries for Python `2.7.5` as `2.7.13`).

When `.ipex` files are created (see #8793 for background), they need to specify a *precise* interpreter version, in order to ensure that the pip resolve that runs when the .ipex is first executed matches exactly the resolve that occurred at build time. However, specifying an interpreter constraint *including* the patch version doesn't affect this invariant, and *does* make .ipex files impossible to run on machines that don't have the exact right patch version of the specified interpreter!

### Solution

- Make the `PEX-INFO` of a .ipex file point to a single interpreter constraint `{major}.{minor}.*`, which matches any patch version for the given major and minor versions.
  - `ipex_launcher.py` will use this interpreter directly to resolve 3rdparty requirements when the .ipex is first executed, so the interpreter constraint in `PEX-INFO` is how .ipex files ensure they have the correct interpreter to resolve for.

### Result

.ipex files are easier to deploy!!!

### Possible Future Work

In addition to multi-platform PEX files, could we also consider multi-interpreter PEX files? Alternatively (and this seems like a *fantastic* longer-term goal) we could follow up on #7369 and wrap the interpreter up with the PEX!
stuhood pushed a commit that referenced this pull request Apr 1, 2020
### Problem

When creating `.ipex` files as per #8793, we don't add resolved distributions to `self._distributions` in `PexBuilderWrapper`. This leads to the annoying `_sanitize_requirements()` hack introduced into `ipex_launcher.py` in that PR (having to remove duplicate `setuptools` requirements).

### Solution

- Add all resolved distributions to `self._distributions` in `PexBuilderWrapper`, even when `generate_ipex=True`.
- Remove the separate list of transitively-resolved requirements passed through several methods in `PexBuilderWrapper`.
cosmicexplorer added a commit that referenced this pull request May 4, 2020
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
cosmicexplorer added a commit that referenced this pull request May 4, 2020
A followup to #8793, after realizing that there are actually many separate patch versions of python interpreters going around. `git tag | wc -l` in the [cpython repo](https://github.com/python/cpython/) results in 421 separate tags, many just differing by the patch version.

The terminology being used here is: `CPython==2.7.5` has a *major version* of 2, *minor version* of 7, and a *patch version* of 5.

It is my impression that interpreter patch versions do not affect the libraries that can be used by the interpreter, just the major and minor versions (as in, pip will resolve the same libraries for Python `2.7.5` as `2.7.13`).

When `.ipex` files are created (see #8793 for background), they need to specify a *precise* interpreter version, in order to ensure that the pip resolve that runs when the .ipex is first executed matches exactly the resolve that occurred at build time. However, specifying an interpreter constraint *including* the patch version doesn't affect this invariant, and *does* make .ipex files impossible to run on machines that don't have the exact right patch version of the specified interpreter!

- Make the `PEX-INFO` of a .ipex file point to a single interpreter constraint `{major}.{minor}.*`, which matches any patch version for the given major and minor versions.
  - `ipex_launcher.py` will use this interpreter directly to resolve 3rdparty requirements when the .ipex is first executed, so the interpreter constraint in `PEX-INFO` is how .ipex files ensure they have the correct interpreter to resolve for.

.ipex files are easier to deploy!!!

In addition to multi-platform PEX files, could we also consider multi-interpreter PEX files? Alternatively (and this seems like a *fantastic* longer-term goal) we could follow up on #7369 and wrap the interpreter up with the PEX!
cosmicexplorer added a commit that referenced this pull request May 4, 2020
When creating `.ipex` files as per #8793, we don't add resolved distributions to `self._distributions` in `PexBuilderWrapper`. This leads to the annoying `_sanitize_requirements()` hack introduced into `ipex_launcher.py` in that PR (having to remove duplicate `setuptools` requirements).

- Add all resolved distributions to `self._distributions` in `PexBuilderWrapper`, even when `generate_ipex=True`.
- Remove the separate list of transitively-resolved requirements passed through several methods in `PexBuilderWrapper`.
cosmicexplorer added a commit that referenced this pull request Aug 6, 2020
…SKIP_EXECUTION env var (#10530)

### Problem

The "ipex" execution method for pex files from #8793 wastes a lot of time zipping up the 3rdparty modules it just hydrated when it is first executed. This is unnecessary, as pex files can be executed as unzipped directories.

### Solution

- Call `.freeze()` instead of `.build()` in `ipex_launcher.py`.
- Add the capability to skip actually executing the pex after hydration if `PEX_IPEX_SKIP_EXECUTION` is set in the subprocess environment.

### Result

On `1.25.x-twtr`:
```bash
> ./pants binary.py --generate-ipex examples/src/python/example/tensorflow_custom_op:show-tf-version && \
  time PEX_PROFILE_FILENAME=ipex.prof ./dist/show-tf-version.ipex
...
41.51s user 12.65s system 86% cpu 1:02.38 total
```

On this PR:
```bash
> ./pants binary.py --generate-ipex examples/src/python/example/tensorflow_custom_op:show-tf-version && \
  time PEX_PROFILE_FILENAME=ipex.prof ./dist/show-tf-version.ipex
...
16.29s user 8.48s system 77% cpu 32.000 total
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants