Run PEXes as normal applications #962

jsirois · 2020-05-04T20:07:29Z

The PEX runtime presents several differences from the typical python runtime and this can lead to problems PEXing various python programs:

The need to mark a PEX as not zip-safe for application code that uses filesystem APIs to find code and resources in the application.
The merging of the sys.path from individual pre-installed wheel chroots can expose bugs in underlying distributions that are normally masked by being installed in the same chroot as their enclosing dependency set.

Although 1 has a workaround with --not-zip-safe it's often surprising to users and not easy to discover as a problem solution. Item 2 doesn't have a solution and this leads to the inability to PEX certain applications that use namespace packages inconsistently in particular (see #331 for examples).

Recent work that added the --unzip option showed that 1st unzipping PEXes led to better cold and warm cache startup latency. If we were to push on this, one solution to both of the problems listed above would be for PEXes to always unzip / re-package themselves in the pex cache with standard site-packages layout (ie: in one sys.path chroot). This should even improve PEX performance more since the sys.path would have less entries allowing imports provided by the PEX to be found in a single fs search instead of needing to search 1 (the pex zip itself) + N (pex dependencies) locations.

The text was updated successfully, but these errors were encountered:

cosmicexplorer · 2020-05-06T08:27:53Z

Would it be feasible to consider extending this proposal to allow the building of PEX files unzipped, which aren't intended to be run on another host? The result of something like PEXBuilder.build_unzipped() could be just the unzipped chroot path copied into the pex cache, which a python application depending on pex could use to know how to invoke the unzipped PEX file.

Rationale

The lazy-loading ipex functionality added to pants in pantsbuild/pants#8793 (which just turns your code intoo a python application that knows how to build the real pex file) currently has to go through a single trip of (when the ipex is first executed) downloading wheels => stuffing them in a pex file, but if pex files were then unconditionally unzipped on top of that it would then involve a second crossing of that zipped/unzipped bridge. Stuffing 3rdparty dependencies into the resulting pex file appears to take more time than actually resolving dependencies (more time than running pants, actually, since the 3rdparty dependencies are not zipped into the ipex file when it is created), especially when the resolve is mostly cached, and the resulting pex will always be run on the same host because ipex will exec the built pex immediately after it is produced.

jsirois · 2020-05-06T13:08:09Z

The Pex API has supported this for a long time and Pants has correspondingly used it for a long time:
https://github.com/pantsbuild/pex/blob/916e61e04634c60d09ee25859c76ba4f27282e51/pex/pex_builder.py#L462-L477

cosmicexplorer · 2020-05-07T15:35:28Z

I had forgotten the difference between .build() and .freeze()! Thank you so much!!

kwlzn · 2020-05-19T09:27:35Z

I suspect an inverted flag to indicate e.g. "--force-zip" (or "--zip-safe=False" becoming the default) might be nice to leave around for tight-quartered execution of large pex envs where expansive space consumption may not be desirable (like a pex that contains a large but otherwise zip-safe library executing on a PySpark worker, etc) - otherwise, this sounds great to me as a default mode.

pex execution overhead is a major UX issue for us particularly with O(GB) pex envs for DS/ML - and for e.g. local tool use cases.

jsirois · 2020-12-02T16:38:54Z

A speed hack note when implementing this:

If we find sys.argv[0] (the PEX zip file) is writeable, we could re-write its shebang to point to the selected venv interpreter to fully eliminate re-exec overhead on subsequent runs:

Given the mechanism here:

$ cat __main__.py 
#!/usr/bin/env python
from __future__ import print_function

def _maybe_reexec():
    import sys

    _BINARY = sys.argv[0]

    # Here we would extract the app to a venv, abbreviated at extracted_app.py for demonstration.
    _NEW_SHEBANG = b"#!" + sys.executable.encode("utf-8") + b" extracted_app.py\n"

    with open(_BINARY, "rb") as fp:
        shebang = fp.readline()
        if shebang == _NEW_SHEBANG:
            return

        import os
        import shutil

        new_binary = "{}.rewrite".format(_BINARY)
        with open(new_binary, "wb") as new_fp:
            new_fp.write(_NEW_SHEBANG)
            new_fp.write(fp.read())

        shutil.copymode(_BINARY, new_binary)
        os.rename(new_binary, _BINARY)
        os.execv(_BINARY, sys.argv)


_maybe_reexec()
del _maybe_reexec


import sys


print("ERROR: should have never gotten here!")
sys.exit(1)

Which relies on the to-be-written PEX -> venv extraction code represented by a pre-extracted single file for demonstration here:

$ cat extracted_app.py 
import sys

print("Hello. ARGV={}".format(sys.argv))

We get:

$ zip main.zip __main__.py && cat <(echo '#!/usr/bin/env python') main.zip > main.pex && chmod +x main.pex
  adding: __main__.py (deflated 53%)
$ head -1 main.pex
#!/usr/bin/env python
$ time ./main.pex
Hello. ARGV=['extracted_app.py', './main.pex']

real	0m0.051s
user	0m0.039s
sys	0m0.007s
$ head -1 main.pex
#!/usr/bin/python
$ time ./main.pex
#!/usr/bin/python extracted_app.py
Hello. ARGV=['extracted_app.py', './main.pex']

real	0m0.024s
user	0m0.016s
sys	0m0.004s
$ time python -c 'print("Hello")'
Hello

real	0m0.022s
user	0m0.018s
sys	0m0.004s

This self-modifying executable approach could be made robust by writing down the values of any PEX_* environment variables that affect interpreter selection and if those don't match on a subsequent run, then re-run interpreter selection and if a new interpreter is called for, re-run the application install / shebang re-write.

Add a new `--include-tools` option to include any pex.tools in generated PEX files. These tools are activated by running PEX files with PEX_TOOLS=1. The `Info` tool seeds the tool set and simply dumps the effective PEX-INFO for the given PEX. Work towards pex-tool#962 and pex-tool#1115

Add a new `--include-tools` option to include any pex.tools in generated PEX files. These tools are activated by running PEX files with PEX_TOOLS=1. The `Info` tool seeds the tool set and simply dumps the effective PEX-INFO for the given PEX. Work towards #962 and #1115

This fixes binary canonicalization to handle virtual environments created with virtualenv instead of pyvenv. It also adds support for resolving the base interpreter used to build a virtual environment. The ability to resolve a virtual environment intepreter will be used to fix pex-tool#1031 where virtual environments created with `--system-site-packages` leak those packages through as regular sys.path entries otherwise undetectable by PEX. Work towards pex-tool#962 and pex-tool#1115.

This fixes binary canonicalization to handle virtual environments created with virtualenv instead of pyvenv. It also adds support for resolving the base interpreter used to build a virtual environment. The ability to resolve a virtual environment intepreter will be used to fix #1031 where virtual environments created with `--system-site-packages` leak those packages through as regular sys.path entries otherwise undetectable by PEX. Work towards #962 and #1115.

Add a `venv` tool to create a virtual environment from a PEX file. The virtual environment is seeded with just the PEX user code and distributions applicable to the selected interpreter for the local machine. The virtual environment does not have Pip installed by default although that can be requested with `--pip`. The virtual environment comes with a `__main__.py` at the root of the venv to emulate a loose pex that can be run with `python venv.dir` just like a loose pex. This entry point supports all the behavior of the original PEX file not related to interpreter selection, namely support for PEX_SCRIPT, PEX_MODULE, PEX_INTERPRETER and PEX_EXTRA_SYS_PATH. A sibling `pex` script is linked to `__main__.py` to provide the maximum performance entrypoint that always avoids interpreter re-execing and thus yields equivalent performance to a pure virtual environment. Work towards #962 and #1115.

The new --venv execution mode builds a PEX file that includes pex.tools and extracts itself into a venv under PEX_ROOT upon 1st execution or any execution that might select a diffrent interpreter than the default. In order to speed up the local build and execute case, --seed mode is added to seed the PEX_ROOT caches that will be used at runtime. This is important for --venv mode since venv seeding depends on the selected interpreter and one is already selected during the PEX file build process. Fixes #962 Fixes #1097 Fixes #1115

jsirois added the enhancement label May 4, 2020

cosmicexplorer mentioned this issue May 6, 2020

avoid zipping up the pex file built by ipex at first run pantsbuild/pants#9704

Closed

jsirois added this to the 3.0 milestone May 11, 2020

jsirois mentioned this issue Dec 2, 2020

Removing "activation" time from PEX #1115

Closed

jsirois self-assigned this Dec 5, 2020

jsirois added the in progress label Dec 5, 2020

jsirois mentioned this issue Dec 7, 2020

Add support for PEX runtime tools & an info tool. #1127

Merged

jsirois mentioned this issue Dec 8, 2020

Add a venv tool. #1128

Merged

This was referenced Dec 11, 2020

Improve PythonInterpreter venv support. #1129

Merged

Release 2.1.22 #1111

Closed

This was referenced Dec 14, 2020

Release 2.1.24 #1138

Closed

Release 2.1.25 #1144

Closed

Add option to extend PATH with additional console scripts #1097

Closed

Prototype: Expose scripts to path #1097 #1117

Closed

jsirois mentioned this issue Dec 22, 2020

Support a --venv mode similar to --unzip mode. #1153

Merged

jsirois closed this as completed in #1153 Dec 24, 2020

jsirois removed the in progress label Dec 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run PEXes as normal applications #962

Run PEXes as normal applications #962

jsirois commented May 4, 2020 •

edited

Loading

cosmicexplorer commented May 6, 2020 •

edited

Loading

jsirois commented May 6, 2020

cosmicexplorer commented May 7, 2020

kwlzn commented May 19, 2020 •

edited

Loading

jsirois commented Dec 2, 2020 •

edited

Loading

Run PEXes as normal applications #962

Run PEXes as normal applications #962

Comments

jsirois commented May 4, 2020 • edited Loading

cosmicexplorer commented May 6, 2020 • edited Loading

Rationale

jsirois commented May 6, 2020

cosmicexplorer commented May 7, 2020

kwlzn commented May 19, 2020 • edited Loading

jsirois commented Dec 2, 2020 • edited Loading

jsirois commented May 4, 2020 •

edited

Loading

cosmicexplorer commented May 6, 2020 •

edited

Loading

kwlzn commented May 19, 2020 •

edited

Loading

jsirois commented Dec 2, 2020 •

edited

Loading