Add option to extend PATH with additional console scripts #1097

jneuff · 2020-11-06T09:23:54Z

Background
I deploy apache airflow as a PEX file with -c airflow. Now, when I run airflow.pex webserver I get FileNotFoundError: [Errno 2] No such file or directory: 'gunicorn': 'gunicorn'. That's because the airflow webserver command calls gunicorn to start the webserver. As gunicorn is part of the PEX, I can call it with PEX_SCRIPT=gunicorn airflow.pex, but it is not in the PATH.

Workaround
My current solution is to have a script called gunicorn:

#!/bin/sh

PEX_SCRIPT=gunicorn airflow.pex $@

Adding this script to PATH before running airflow webserver works.

Proposal
It would come in handy to have an option for PEX to add additional console scripts contained in the PEX to the PATH. Creating the PEX file could look like this:

pex -r requirements.txt -c airflow --extend-path-with-scripts gunicorn -o airflow.pex

And of course a related runtime environment variable would make sense too.

What's your opinion on this? I'd be happy to contribute this feature!

The text was updated successfully, but these errors were encountered:

jsirois · 2020-11-06T18:39:00Z

This makes sense.

To be clear though, on the surface its not as trivial as it may sound. Since a PEX is a zipfile, you can't add items within it to the PATH directly. You need to extract those items and add extracted locations to the PATH (and, in some cases, to sys.path). It turns out PEX files already extract all contained distributions to a location under ~/.pex at runtime (if not run and extracted previously) though, so its likely the case that most heavy lifting needed to do this is in place.

Digging deeper, more problems surface. Pretend this is all implemented and consider a typical gunicorn script that will now be on the PATH:

$ cat /home/jsirois/.pex/installed_wheels/5b9580f6c90af9b2d97488e3d17143cca0b6de2a/gunicorn-20.0.4-py2.py3-none-any.whl/bin/gunicorn 
#!/usr/bin/python3.8
# -*- coding: utf-8 -*-
import re
import sys
from gunicorn.app.wsgiapp import run
if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(run())

That shebang will be a problem for a multiplatform PEX. Say the PEX was built with --interpreter-constraint=>=3.6,<4 or --python=python3.8 --python=python3.9. Since gunicorn is a universal distribution, the various pythons the PEX is built for will all use the same gunicorn-20.0.4-py2.py3-none-any.whl. As such, one of those interpreters will be the one to install the gunicorn distrubution into the PEX file and that interpreter will set the script shebang. Now when the PEX file is shipped to a machine with only Python 3.9 using this example, things won't work.

jneuff · 2020-11-10T07:25:36Z

Thanks for pointing out these difficulties. My first idea was to just add the existing scripts to PATH, but as you showed, this will not work for multiplatform PEX.

Another approach would be to expose the scripts in a central bin directory somewhere in the unzipped PEX. Either in a similar fashion to my workaround, or directly using the mechanics PEX uses to run a console script. I'll have to look into the details of what happens when executing a PEX file.

Regarding the extraction: Are the contents of a PEX file always exposed under $PEX_ROOT/unzipped_pexes?

jsirois · 2020-11-12T16:43:47Z

Another approach would be to expose the scripts in a central bin directory somewhere in the unzipped PEX. Either in a similar fashion to my workaround, or directly using the mechanics PEX uses to run a console script. I'll have to look into the details of what happens when executing a PEX file.

You may have missed the significance of the shebang in scripts. Those shebangs are pinned to one interpreter - python3.8 in the gunicorn example above. That's fine if you've built a single-interpreter PEX but many folks use PEX to produce multi-interpreter / multi-platform PEXes. In that case the script will only work for one of the targeted interpreters. If the PEX is shipped to a machine without that interpreter but with another compatible interpreter the PEX will work up until user code tries to run that script at which point it will fail.

Regarding the extraction: Are the contents of a PEX file always exposed under $PEX_ROOT/unzipped_pexes?

No, only PEX files created with --unzip which is not the default. For those other pexes, their installed wheels are always unzipped prior to execution in $PEX_ROOT/installed_wheels; thus the /home/jsirois/.pex/installed_wheels/5b9580f6c90af9b2d97488e3d17143cca0b6de2a/gunicorn-20.0.4-py2.py3-none-any.whl/bin/gunicorn example above.

jneuff · 2020-11-24T12:35:25Z

You may have missed the significance of the shebang in scripts. Those shebangs are pinned to one interpreter - python3.8 in the gunicorn example above. That's fine if you've built a single-interpreter PEX but many folks use PEX to produce multi-interpreter / multi-platform PEXes. In that case the script will only work for one of the targeted interpreters. If the PEX is shipped to a machine without that interpreter but with another compatible interpreter the PEX will work up until user code tries to run that script at which point it will fail.

So basically, if I understand this correctly, the -c feature is broken for multiplatform PEX. I thought there'd be some mechanics to deal with the shebangs (but then of course, we could just use these mechanics here). In my opinion, fixing the -c behavior for multiplatform PEX is orthogonal to this issue at hand.

jsirois · 2020-11-24T13:00:16Z

You indeed don't understand correctly.

When you build a PEX file using -c, that script is validated to exist and then the script name is stored in the PEX file (in PEX-INFO). When the PEX file is executed, it first reads PEX-INFO and learns it should hand control to a script. It then finds that script and executes it by reading the script contents and then executing that via effectively python eval:
https://github.com/pantsbuild/pex/blob/16a4b3a4980008fe47a509afc3b24381a6649a95/pex/pex.py#L573-L605

N.B.: Since the script code is directly executed in the runtime interpreter, the shebang is discarded since it's just a comment at the top of the python script file.

The key difference here is Pex executes the script from -c in-process whereas it sounds like you want to execute additional scripts via user or 3rdparty code via subprocess (i.e.: via the os which requires the shebang).

jneuff · 2020-11-24T14:28:21Z

Thanks for the clarification. I just dug into the code and found the two functions you quoted. So PEX makes sure all scripts work as expected.

My proposal from above is to execute the designated additional scripts either by wrapping them in a shell script that actually calls the PEX or directly using PEX mechanics (maybe similar to the __main__.py found in PEX files).

As far as I know, executables on the PATH must be files. Thus, there is no way around representing the scripts as such. Now, I think rendering a shell script for each additional script that just calls PEX is a simple and good solution.

The question is, where to put these scripts. For additional scripts that are specified at build time, we could have a bin directory under $PEX_ROOT/bin/<PEX hash>. That contains files like these:

#!/bin/sh

PEX_SCRIPT=gunicorn exec /path/to/my.pex $@

Then we extend the PATH with this directory.

But what happens if you specify additional scripts at runtime, like PEX_ADDITIONAL_SCRIPTS=gunicorn? We cannot just put their wrappers into $PEX_ROOT/bin/<PEX hash>, because next time you might run the PEX without PEX_ADDITIONAL_SCRIPTS.. Probably we have to ensure the correct state of $PEX_ROOT/bin/<PEX hash> for every call, that would solve this problem.

So, to make this fly, I need to provide for:

Build-time arguments to set additional scripts.
Storage of this information in PEX-INFO.
Run-time environment variables.
Combination of run-time and build-time information.
Creation (or restoration) of the $PEX_ROOT/bin/<PEX hash> directory.
Extension of PATH.

And of course we could extend this feature to additional -m entry points as well.

Does this sound reasonable to you?

jneuff · 2020-12-21T09:08:24Z

@jsirois From my point of view the new pex-tools venv feature covers the requirements outlined in this issue. Sorry, I couldn't give more feedback on that PR. I see you kept this issue on the release docket - do you think this feature is still relevant now? Or shall we close this issue?

jsirois · 2020-12-21T16:14:04Z

I don't consider the --include-tools/PEX_TOOLS=1 my.pex venv ... a completed solution so I left this open. I have a branch implementing a new --venv [prepend|append] build-time flag similar to the existing --unzip build time flag that will package a PEX file such that when it runs it will automatically crate itself a venv (under ~/.pex/venvs/...` and re-execute from there. That will close #962 and this issue as well.

The new --venv execution mode builds a PEX file that includes pex.tools and extracts itself into a venv under PEX_ROOT upon 1st execution or any execution that might select a diffrent interpreter than the default. In order to speed up the local build and execute case, --seed mode is added to seed the PEX_ROOT caches that will be used at runtime. This is important for --venv mode since venv seeding depends on the selected interpreter and one is already selected during the PEX file build process. Fixes #962 Fixes #1097 Fixes #1115

jneuff mentioned this issue Dec 3, 2020

Prototype: Expose scripts to path #1097 #1117

Closed

This was referenced Dec 11, 2020

Release 2.1.22 #1111

Closed

Release 2.1.24 #1138

Closed

Release 2.1.25 #1144

Closed

jsirois self-assigned this Dec 21, 2020

jsirois added in progress feature request labels Dec 21, 2020

jsirois mentioned this issue Dec 22, 2020

Support a --venv mode similar to --unzip mode. #1153

Merged

jsirois closed this as completed in #1153 Dec 24, 2020

jsirois removed the in progress label Dec 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to extend PATH with additional console scripts #1097

Add option to extend PATH with additional console scripts #1097

jneuff commented Nov 6, 2020

jsirois commented Nov 6, 2020 •

edited

Loading

jneuff commented Nov 10, 2020

jsirois commented Nov 12, 2020

jneuff commented Nov 24, 2020

jsirois commented Nov 24, 2020

jneuff commented Nov 24, 2020 •

edited

Loading

jneuff commented Dec 21, 2020

jsirois commented Dec 21, 2020

Add option to extend PATH with additional console scripts #1097

Add option to extend PATH with additional console scripts #1097

Comments

jneuff commented Nov 6, 2020

jsirois commented Nov 6, 2020 • edited Loading

jneuff commented Nov 10, 2020

jsirois commented Nov 12, 2020

jneuff commented Nov 24, 2020

jsirois commented Nov 24, 2020

jneuff commented Nov 24, 2020 • edited Loading

jneuff commented Dec 21, 2020

jsirois commented Dec 21, 2020

jsirois commented Nov 6, 2020 •

edited

Loading

jneuff commented Nov 24, 2020 •

edited

Loading