-
-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run PEXes as normal applications #962
Comments
Would it be feasible to consider extending this proposal to allow the building of PEX files unzipped, which aren't intended to be run on another host? The result of something like RationaleThe lazy-loading ipex functionality added to pants in pantsbuild/pants#8793 (which just turns your code intoo a python application that knows how to build the real pex file) currently has to go through a single trip of (when the ipex is first executed) downloading wheels => stuffing them in a pex file, but if pex files were then unconditionally unzipped on top of that it would then involve a second crossing of that zipped/unzipped bridge. Stuffing 3rdparty dependencies into the resulting pex file appears to take more time than actually resolving dependencies (more time than running pants, actually, since the 3rdparty dependencies are not zipped into the ipex file when it is created), especially when the resolve is mostly cached, and the resulting pex will always be run on the same host because ipex will exec the built pex immediately after it is produced. |
The Pex API has supported this for a long time and Pants has correspondingly used it for a long time: |
I had forgotten the difference between |
I suspect an inverted flag to indicate e.g. "--force-zip" (or "--zip-safe=False" becoming the default) might be nice to leave around for tight-quartered execution of large pex envs where expansive space consumption may not be desirable (like a pex that contains a large but otherwise zip-safe library executing on a PySpark worker, etc) - otherwise, this sounds great to me as a default mode. pex execution overhead is a major UX issue for us particularly with O(GB) pex envs for DS/ML - and for e.g. local tool use cases. |
A speed hack note when implementing this: If we find sys.argv[0] (the PEX zip file) is writeable, we could re-write its shebang to point to the selected venv interpreter to fully eliminate re-exec overhead on subsequent runs: Given the mechanism here: $ cat __main__.py
#!/usr/bin/env python
from __future__ import print_function
def _maybe_reexec():
import sys
_BINARY = sys.argv[0]
# Here we would extract the app to a venv, abbreviated at extracted_app.py for demonstration.
_NEW_SHEBANG = b"#!" + sys.executable.encode("utf-8") + b" extracted_app.py\n"
with open(_BINARY, "rb") as fp:
shebang = fp.readline()
if shebang == _NEW_SHEBANG:
return
import os
import shutil
new_binary = "{}.rewrite".format(_BINARY)
with open(new_binary, "wb") as new_fp:
new_fp.write(_NEW_SHEBANG)
new_fp.write(fp.read())
shutil.copymode(_BINARY, new_binary)
os.rename(new_binary, _BINARY)
os.execv(_BINARY, sys.argv)
_maybe_reexec()
del _maybe_reexec
import sys
print("ERROR: should have never gotten here!")
sys.exit(1) Which relies on the to-be-written PEX -> venv extraction code represented by a pre-extracted single file for demonstration here: $ cat extracted_app.py
import sys
print("Hello. ARGV={}".format(sys.argv)) We get: $ zip main.zip __main__.py && cat <(echo '#!/usr/bin/env python') main.zip > main.pex && chmod +x main.pex
adding: __main__.py (deflated 53%)
$ head -1 main.pex
#!/usr/bin/env python
$ time ./main.pex
Hello. ARGV=['extracted_app.py', './main.pex']
real 0m0.051s
user 0m0.039s
sys 0m0.007s
$ head -1 main.pex
#!/usr/bin/python
$ time ./main.pex
#!/usr/bin/python extracted_app.py
Hello. ARGV=['extracted_app.py', './main.pex']
real 0m0.024s
user 0m0.016s
sys 0m0.004s
$ time python -c 'print("Hello")'
Hello
real 0m0.022s
user 0m0.018s
sys 0m0.004s This self-modifying executable approach could be made robust by writing down the values of any |
Add a new `--include-tools` option to include any pex.tools in generated PEX files. These tools are activated by running PEX files with PEX_TOOLS=1. The `Info` tool seeds the tool set and simply dumps the effective PEX-INFO for the given PEX. Work towards pex-tool#962 and pex-tool#1115
This fixes binary canonicalization to handle virtual environments created with virtualenv instead of pyvenv. It also adds support for resolving the base interpreter used to build a virtual environment. The ability to resolve a virtual environment intepreter will be used to fix pex-tool#1031 where virtual environments created with `--system-site-packages` leak those packages through as regular sys.path entries otherwise undetectable by PEX. Work towards pex-tool#962 and pex-tool#1115.
This fixes binary canonicalization to handle virtual environments created with virtualenv instead of pyvenv. It also adds support for resolving the base interpreter used to build a virtual environment. The ability to resolve a virtual environment intepreter will be used to fix #1031 where virtual environments created with `--system-site-packages` leak those packages through as regular sys.path entries otherwise undetectable by PEX. Work towards #962 and #1115.
Add a `venv` tool to create a virtual environment from a PEX file. The virtual environment is seeded with just the PEX user code and distributions applicable to the selected interpreter for the local machine. The virtual environment does not have Pip installed by default although that can be requested with `--pip`. The virtual environment comes with a `__main__.py` at the root of the venv to emulate a loose pex that can be run with `python venv.dir` just like a loose pex. This entry point supports all the behavior of the original PEX file not related to interpreter selection, namely support for PEX_SCRIPT, PEX_MODULE, PEX_INTERPRETER and PEX_EXTRA_SYS_PATH. A sibling `pex` script is linked to `__main__.py` to provide the maximum performance entrypoint that always avoids interpreter re-execing and thus yields equivalent performance to a pure virtual environment. Work towards #962 and #1115.
The new --venv execution mode builds a PEX file that includes pex.tools and extracts itself into a venv under PEX_ROOT upon 1st execution or any execution that might select a diffrent interpreter than the default. In order to speed up the local build and execute case, --seed mode is added to seed the PEX_ROOT caches that will be used at runtime. This is important for --venv mode since venv seeding depends on the selected interpreter and one is already selected during the PEX file build process. Fixes #962 Fixes #1097 Fixes #1115
The PEX runtime presents several differences from the typical python runtime and this can lead to problems PEXing various python programs:
Although 1 has a workaround with
--not-zip-safe
it's often surprising to users and not easy to discover as a problem solution. Item 2 doesn't have a solution and this leads to the inability to PEX certain applications that use namespace packages inconsistently in particular (see #331 for examples).Recent work that added the
--unzip
option showed that 1st unzipping PEXes led to better cold and warm cache startup latency. If we were to push on this, one solution to both of the problems listed above would be for PEXes to always unzip / re-package themselves in the pex cache with standard site-packages layout (ie: in onesys.path
chroot). This should even improve PEX performance more since thesys.path
would have less entries allowing imports provided by the PEX to be found in a single fs search instead of needing to search 1 (the pex zip itself) + N (pex dependencies) locations.The text was updated successfully, but these errors were encountered: