-
-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify PEX buildtime and runtime wheel caches. #821
Conversation
Previously these caches were seperate. Downloaded wheels and sdists were cached in `~/.pex/build` and wheels unzipped from zipped pexes at runtime were cached to `~/.pex/install`. Now the caches are unified by the resolver such that any wheel installs performed by it can be seen by zipped PEXes on the same machine when they go to potentially unzip wheel distributions stored within at PEX boot time. N.B.: The cache is not unified in the other direction. If a zipped PEX is executed on the same machine a PEX build resolve later happens on, any intersecting wheels will be re-downloaded, built and installed by the resolve finally unifying the caches from that point forward. Fixes pex-tool#820
The test demonstrates how to create a dehydrated pex that, upon first execution, produces a hydrated pex that it hands control to from then forward. This is the motivating use case for the cache unification change which prevents dehydrated pexes from performing wheel unzipping twice - once during pex hydration (resolve) and once during hydrated pex run.
Reviewers, feel free to ignore the second commit unless it's of particular interest to you or otherwise request it be reverted. It is fairly unrelated to the main change except that it spells out the motivating case and that may be worth preserving. I'm at your command here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to start playing around with replicating test_issues_789_demo()
to pants!
I would not have a problem with it being reverted! I think it may be worth preserving if it doesn't seem too brittle, but I also plan to hit the ground running with this concept immediately in pants, so we may not need to keep around a reference for the technique. It may be helpful for other build tools wishing to employ the "dehydrated" pex technique however, at least in giving them a place to start from. |
Co-Authored-By: Danny McClanahan <1305167+cosmicexplorer@users.noreply.github.com>
### Problem See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches. @jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787. **Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it. ### Solution With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains: - in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from. - in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False. - in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed. ### Result For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size: ```bash X> ls dist total 145M -rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex* -rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex* ```
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches. @jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert pantsbuild#8787. **Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it. With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains: - in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from. - in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False. - in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed. For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size: ```bash X> ls dist total 145M -rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex* -rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex* ```
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches. @jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert pantsbuild#8787. **Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it. With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains: - in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from. - in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False. - in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed. For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size: ```bash X> ls dist total 145M -rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex* -rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex* ```
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches. @jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787. **Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it. With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains: - in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from. - in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False. - in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed. For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size: ```bash X> ls dist total 145M -rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex* -rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex* ```
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches. @jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787. **Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it. With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains: - in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from. - in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False. - in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed. For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size: ```bash X> ls dist total 145M -rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex* -rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex* ```
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches. @jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787. **Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it. With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains: - in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from. - in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False. - in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed. For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size: ```bash X> ls dist total 145M -rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex* -rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex* ```
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches. @jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787. **Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it. With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains: - in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from. - in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False. - in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed. For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size: ```bash X> ls dist total 145M -rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex* -rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex* ```
Previously these caches were seperate. Downloaded wheels and sdists were
cached in
~/.pex/build
and wheels unzipped from zipped pexes atruntime were cached to
~/.pex/install
.Now the caches are unified by the resolver such that any wheel installs
performed by it can be seen by zipped PEXes on the same machine when
they go to potentially unzip wheel distributions stored within at PEX
boot time.
N.B.: The cache is not unified in the other direction. If a zipped PEX
is executed on the same machine a PEX build resolve later happens on,
any intersecting wheels will be re-downloaded, built and installed by
the resolve finally unifying the caches from that point forward.
Fixes #820