Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add --platform flag to pip install and pip wheel #5453

Open
chriskuehl opened this issue May 29, 2018 · 27 comments
Open

Proposal: Add --platform flag to pip install and pip wheel #5453

chriskuehl opened this issue May 29, 2018 · 27 comments
Labels
type: feature request Request for a new feature

Comments

@chriskuehl
Copy link
Contributor

chriskuehl commented May 29, 2018

What's the problem this feature will solve?

By default, the wheel platform detection pip does on Linux isn’t very advanced: it produces a platform tag like linux_x86_64, which is the same across nearly all Linux installations.

For packages with compiled bits that link against system-installed shared objects (.so files), it’s necessary to use different wheels for systems with different versions of the shared libraries. For example, a wheel built on Ubuntu 14.04 will not necessarily work on Ubuntu 16.04.

Being able to specify a custom platform name would allow building the same package version on different systems to produce two wheels with unique filenames that can be served from the same PyPI registry. For example, you could produce two wheels with the (already valid) filenames:

  • numpy-1.9.2-cp36-cp36m-linux_ubuntu_14_04_x86_64.whl
  • numpy-1.9.2-cp36-cp36m-linux_ubuntu_16_04_x86_64.whl

These wheels can be distributed to package consumers (e.g. internally inside a company), who get the benefits of quick wheel installation (no compile-on-install) and the possibility to work from several different platforms.

Describe the solution you'd like

We would like to add a --platform flag to the pip install and pip wheel commands, with identical behavior to the existing --platform flag on the pip download command. This change would allow building and installing wheels with a custom, user-provided platform name.

At $COMPANY, we’ve been building wheels with a custom platform name (just like the numpy example above) for several years. We upload these wheels to our internal PyPI registry, and install them using using pip-custom-platform, a wrapper around pip that adds in the --platform flag being requested in this issue.

Because we have hundreds of different internal Python projects, upgrading from one OS version to another is a long process during which we need to support development (and wheels!) on multiple OSes.

Overall, we’ve been very happy with the custom platform approach. The burden on projects and friction for developers is very low, and conceptually it fits in very nicely with the “platform” concept of a wheel (similar to how macOS wheels specify versions in the wheel’s platform tag). The downside for us right now, of course, is that we have to use a monkey-patched version of pip instead of the real thing.

pip-custom-platform has also started to be used by others in interesting use cases such as building packages for AWS Lambda to specify their own platform names.

The scope of the proposed change to pip is identical to the --platform flag for pip download: it’s just a flag that allows users to manually specify a platform to use. pip-custom-platform does have functionality to automatically generate platform names based on the Linux distribution, but we think it’s better to leave this complexity out of pip, and instead have users construct the platform name themselves and either pass in this flag manually or as an environment variable (possibly set at the system level by a system administrator, e.g. in /etc/environment).

Alternative solutions

We considered several potential alternative solutions to this problem:

  • Abandon wheels entirely and compile on installation. This works but makes installation unrealistically slow (20+ minutes for some projects) and requires installing build tooling and library headers (gcc, fortran, libxxx-dev, cmake, etc.) everywhere.
  • Running one PyPI registry per distribution, where each serves wheels with the linux_86_64 platform tag but which have been compiled on the corresponding platform. This could work, but is pretty unwieldy: every project would need some hackery in their build scripts to select the correct server, plus we’d have to run a ton of these registries (we currently support 3 Ubuntu versions, plus about half a dozen random platforms which are used in specialty cases but are still important to support, e.g. random Amazon Linux AMIs used for EMR). Additionally, if you were to accidentally use the wrong PyPI server and download wheels built for the wrong platform, pip would happily install them without problem and you wouldn’t notice until you get runtime errors in your code.

Additional context

Why can’t we use manylinux wheels?

manylinux wheels are a method of providing a single built wheel that works across most Linux installations. In general they do this job pretty well, but unfortunately we’re unable to use them for many packages due to the security concerns associated with them.

Specifically, for packages which need to depend on C libraries outside of the manylinux1 profile (for example, anything that depends on libssl, or almost all of the scientific Python libraries), the choices for producing a manylinux wheel are to either statically link in the dependencies, or to bundle the .so files in the wheel. In both of these cases, it is very difficult to roll out security updates across hundreds of different services, as it may involve patching and rebuilding the affected wheels, then building and deploying all of the services that consume them.

By contrast, rolling out security updates to shared libraries when services don’t bundle or statically link them is typically as easy as your distro’s equivalent of apt-get upgrade to pull in the latest patched version.

@asottile
Copy link
Contributor

(full disclosure: I helped with this writeup) I'd be happy to assist with implementation as well if this proposal seems good

@lorencarvalho
Copy link
Contributor

hey @chriskuehl

Curious if you think my open PR #5404 would satisfy your use case? It basically ports all the dist options from being exclusive to download to working with install (only when used in conjunction with --target) though it would make it easy to consume in the wheel subcommand as well, possibly.

@asottile
Copy link
Contributor

It's close, though we would want to be able to use those options even without --target -- is there a reason you've restricted them to --target (there's already the requirement on either --no-deps or --only-binary which should make it "safe")

@lorencarvalho
Copy link
Contributor

That was a request from @pfmoore in my initial issue #5355 -- and I tend to agree with their reasoning... it would become very easy to fundamentally break folks python installations if a local-incompatible wheel was installed. That's the whole reason for the local pep425 checks in the first place, as i understand it.

If you did want to install a locally in-compatible wheel, you could with my PR, you'd just have to specify --target /path/to/your/local/site-packages. I'd be open to something more explicit too, like --skip-compatilibity-check or something, but restricting to --target (for install at least) makes sense.

@chriskuehl
Copy link
Contributor Author

Thanks for linking to that PR @sixninetynine, this looks great!

I totally agree that we want to be explicit and obvious so that nobody accidentally installs incompatible wheels. I think that the --platform flag does that pretty well, though -- I feel like a command like pip install --platform linux_ubuntu_16_04_x86_64 is pretty explicit already that you're installing wheels for a specific platform.

I can definitely see why something like PIP_IGNORE_PEP425 could be a concern (easy to get copy-pasted around without understanding what's going on), but I wonder if the same concerns still apply when explicitly naming one specific platform to be allowed (as opposed to disabling the safety mechanism)?

@lorencarvalho
Copy link
Contributor

That's a really good point, merely specifying the platform stuff is making your intention pretty explicit. I don't hold too strong of an opinion on that, I'm definitely open to changing the PR based on what the community and maintainers feel is best 🙂

@asottile
Copy link
Contributor

Requiring --target doesn't really help our usecases described above so I'd love for an amend on that 😆

Our goal would be to do something like:

echo 'PIP_PLATFORM=linux_ubuntu_16_04_x86_64' >> /etc/environment
echo 'PIP_INDEX_URL=https://pypi.mycompany.com/simple' >> /etc/environment
echo 'PIP_ONLY_BINARY=:all:' >> /etc/environment

and then pip install x would just work

@chadrik
Copy link

chadrik commented Aug 25, 2018

I tested the changes in PR #5404 and I agree that it would be more convenient to able to use --platform, etc without needing to use --target. For example, how can I install entry-point scripts when using --target?

@lorencarvalho
Copy link
Contributor

@chadrik what do you mean by

how can I install entry-point scripts when using --target?

@asottile you can still accomplish what you want, you just need to also set PIP_TARGET to the site-packages dir of your choice (including the system's site-packages dir, if you wish).

(tmp_venv) darwin ~ $ PIP_PLATFORM="manylinux1_x86_64" PIP_ONLY_BINARY=":all:" PIP_TARGET="local/path" python3 -m pip install numpy
Collecting numpy
  Using cached https://files.pythonhosted.org/packages/fe/94/7049fed8373c52839c8cde619acaf2c9b83082b935e5aa8c0fa27a4a8bcc/numpy-1.15.1-cp36-cp36m-manylinux1_x86_64.whl
Installing collected packages: numpy
Successfully installed numpy-1.15.1
(tmp_venv) darwin ~ $ tar cf site-packages.tar local/
(tmp_venv) darwin ~ $ scp site-packages.tar xx@xx:~/
site-packages.tar                                                                                                                           100%   57MB   1.3MB/s   00:45
(tmp_venv) darwin ~ $ ssh root@159.65.110.140
Welcome to Ubuntu 18.04.1 LTS (GNU/Linux 4.15.0-30-generic x86_64)
hemera:~# tar xf site-packages.tar
hemera:~# PYTHONPATH=local/path/ python3
Python 3.6.5 (default, Apr  1 2018, 05:46:30)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>>

@asottile
Copy link
Contributor

asottile commented Aug 25, 2018

@sixninetynine I can't set PIP_TARGET statically and use pip. Each caller of pip needs to know about it which doesn't actually help at all (I would need a wrapper script that essentially undoes the --target requirement). (EDIT: in that it's not better than what pip-custom-platform has to offer)

It also doesn't help with putting things into the ./bin directory which is what @chadrik is asking about.

@lorencarvalho
Copy link
Contributor

I can't set PIP_TARGET statically and use pip

@asottile I don't understand this, why not? Setting it to your interpreter's site-packages means it's set explicitly (thus satisfying the existing requirement to use --target) but it will behave like a normal pip invocation. I'm thinking something along the lines of

PIP_TARGET=`python3 -c "import sys;print(sys.path.pop())"`

I'm not saying that I'm right and you are wrong, I just want to make sure I fully grok the use cases you are describing. I'm still confused on what you and @chadrik mean by scripts or ./bin too... not sure I'm following there, an example would be super helpful!

That said, @ncoghlan mentioned on the pypa mailing list:

The basic idea seems like a good one to me, and starting out with the --target restriction doesn't hurt - it's much easier to relax restrictions like that later than it is to figure out whether or not you've correctly handled all the edge cases that arise without them.

So relaxing that requirement is definitely not out of the question.

@asottile
Copy link
Contributor

If you look at my example here I want to be able to have pip install on my managed machines just work. I'm so close to having what I want but I am unable to because --target is required. Note here that /etc/environment is added to the environment of all users of that machine.

Additionally let me expand on @chadrik's issue, taking into account what you're suggesting with PIP_TARGET (I'll show you what you're suggesting and why it doesn't work for @chadrik's usecase, I hope it's clear why it doesn't work for mine but if not I can expand on that too).

#!/usr/bin/env bash
set -e
rm -rf venv
virtualenv venv >& /dev/null
. venv/bin/activate

set -x

: show the normal behaviour
pip install astpretty
python -c 'import astpretty; print(astpretty.__file__)'
astpretty /dev/stdin <<< '1 + 1'

pip uninstall -y astpretty
hash -r

: now with PIP_TARGET
export PIP_TARGET="$(venv/bin/python -c 'import sysconfig; print(sysconfig.get_path("purelib"))')"
pip install astpretty
python -c 'import astpretty; print(astpretty.__file__)'
astpretty /dev/stdin <<< '1 + 1'
$ bash t.sh 
+ : show the normal behaviour
+ pip install astpretty
Collecting astpretty
  Using cached https://files.pythonhosted.org/packages/5e/6a/3630d505aa6ea8aa478fcb0059e674fbcf7e02ade23789a13cd86bf87864/astpretty-1.3.0-py2.py3-none-any.whl
Installing collected packages: astpretty
Successfully installed astpretty-1.3.0
+ python -c 'import astpretty; print(astpretty.__file__)'
/tmp/t/venv/lib/python3.6/site-packages/astpretty.py
+ astpretty /dev/stdin
Module(
    body=[
        Expr(
            lineno=1,
            col_offset=0,
            value=BinOp(
                lineno=1,
                col_offset=0,
                left=Num(lineno=1, col_offset=0, n=1),
                op=Add(),
                right=Num(lineno=1, col_offset=4, n=1),
            ),
        ),
    ],
)
+ pip uninstall -y astpretty
Uninstalling astpretty-1.3.0:
  Successfully uninstalled astpretty-1.3.0
+ hash -r
+ : now with PIP_TARGET
++ venv/bin/python -c 'import sysconfig; print(sysconfig.get_path("purelib"))'
+ export PIP_TARGET=/tmp/t/venv/lib/python3.6/site-packages
+ PIP_TARGET=/tmp/t/venv/lib/python3.6/site-packages
+ pip install astpretty
Collecting astpretty
  Using cached https://files.pythonhosted.org/packages/5e/6a/3630d505aa6ea8aa478fcb0059e674fbcf7e02ade23789a13cd86bf87864/astpretty-1.3.0-py2.py3-none-any.whl
Installing collected packages: astpretty
Successfully installed astpretty-1.3.0
Target directory /tmp/t/venv/lib/python3.6/site-packages/__pycache__ already exists. Specify --upgrade to force replacement.
+ python -c 'import astpretty; print(astpretty.__file__)'
/tmp/t/venv/lib/python3.6/site-packages/astpretty.py
+ astpretty /dev/stdin
t.sh: line 21: astpretty: command not found

@chriskuehl
Copy link
Contributor Author

@sixninetynine @ncoghlan what would you want to see before considering relaxing the --target restriction? Anything we can help with?

@lorencarvalho
Copy link
Contributor

@chriskuehl I have no personal qualms with that restriction, it was requested by @pfmoore in my initial issue (#5355, prior to submitting the PR #5404). I think the logic is very sound:

The PEP 425 mechanisms are there precisely to protect users against installing incompatible code into their Python installation, and we should maintain that

Perhaps some other safeguards could be employed to retain that protection while not including the strict --target requirement. I had no issue using --target (in fact my tool uses it explicitly), so I selfishly had no objection.

@lorencarvalho
Copy link
Contributor

@asottile I did look at that example, and I guess that's why I didn't understand your objection to setting PIP_TARGET in that same /etc/environment file (and just have it set to the interpreter's default site-packages, or a PYTHONPATH or somesuch). I definitely concede that it'd be a lot easier to manage without the --target requirement.

thanks for taking the time to illuminate @chadrik's issue, definitely makes sense to me now

@chriskuehl
Copy link
Contributor Author

@sixninetynine thanks for the reply!

Definitely agree about wanting to protect users. It sounds like the concern is around whether passing --platform is explicit enough? Admittedly I find it pretty explicit already -- the user is pretty much saying "I want packages for platform X" -- but would definitely be interested in alternative ideas.

@asottile
Copy link
Contributor

@asottile I did look at that example, and I guess that's why I didn't understand your objection to setting PIP_TARGET in that same /etc/environment file (and just have it set to the interpreter's default site-packages, or a PYTHONPATH or somesuch). I definitely concede that it'd be a lot easier to manage without the --target requirement.

Because you simply can't , you set a single value once at startup and you can't change it. It can't adjust as you source / unsource virtualenvs. You can't make it work for different interpreters (python2.7 / python3.5 / python3.6 / python3.7). /etc/environment also isn't a shell script, it's a simple static file containing k=v assignments

Definitely agree about wanting to protect users. It sounds like the concern is around whether passing --platform is explicit enough? Admittedly I find it pretty explicit already -- the user is pretty much saying "I want packages for platform X" -- but would definitely be interested in alternative ideas.

@chriskuehl There's also a restriction that --only-binary :all: (or --no-deps) must be passed -- I imagine you'd have to try really hard to do this wrong :)

@lorencarvalho
Copy link
Contributor

lorencarvalho commented Aug 26, 2018

@asottile ahhaa thanks! I'm definitely on the same page now.

Re: explicitness. I think the key distinction is that it would be counter-intuitive to run pip install --platform <non local platform> dep, since the default behavior is to install locally. @asottile's case makes sense because of custom platform tags that are locally compatible (if I'm understanding correctly). --target implies that you are installing a non-locally compatible thing for some alternative purpose. I think cases can be made for both sides of the argument. This reminds me a lot of my original feature request which was just a way to disable the pep425 checks entirely.

Regardless, I think #5404 is close to a good solution, but clearly this level of "power usage" is required by multiple folks -- and given that pip is not able to be used as a library (from which someone could add their own custom functionality) I think that lifting the --target restriction should be pursued, maybe with some warning output about potential issues with non-local compatibility?

@chadrik
Copy link

chadrik commented Jan 8, 2019

any movement on this?

between this issue, #6121, and #6117 I can't come up with any solution that will work to install packages for multiple platforms/python versions.

@lorencarvalho
Copy link
Contributor

@chadrik since #5404 is committed now, all it would take is removing the --target restriction. I wrote it in such a way that it's easy to remove the restriction. You'd simply need to remove the check_target kwarg from

def check_dist_restriction(options, check_target=False):
and remove the conditional in the function body
if check_target:
if dist_restriction_set and not options.target_dir:
raise CommandError(
"Can not use any platform or abi specific options unless "
"installing via '--target'"
)

@asottile
Copy link
Contributor

asottile commented Jan 8, 2019

I started working on this! but haven't gotten too far. in my branch I lifted the restriction just for --platform, but then ran into other places in the codebase assuming based on args.target

I'll hopefully have something more working tomorrow, I was just poking at this while on an airplane without internet so I was a little strapped for figuring out what was and wasn't working.

also turns out --find-links does wheel validation differently than via an index so I burned a bunch of time trying to figure out why my test was passing before I wrote the code that makes it work >.<

@asottile
Copy link
Contributor

asottile commented Jan 9, 2019

Here's my start on this: https://github.com/pypa/pip/compare/master...asottile:relax_target_requirement_5453?expand=1

the tests are passing, but they should not be, I need to make them fail first and then fix them:

$ pip-custom-platform wheel simplejson -w wheels
Collecting simplejson
  Using cached https://files.pythonhosted.org/packages/e3/24/c35fb1c1c315fc0fffe61ea00d3f88e85469004713dab488dee4f35b0aff/simplejson-3.16.0.tar.gz
Building wheels for collected packages: simplejson
  Building wheel for simplejson (setup.py) ... done
  Stored in directory: /tmp/tmpmdd9gjrc
Successfully built simplejson
$ ls wheels/
simplejson-3.16.0-cp36-cp36m-linux_ubuntu_18_04_x86_64.whl
$ pip install --platform linux_ubuntu_18_04_x86_64 --only-binary :all: wheels/simplejson-3.16.0-cp36-cp36m-linux_ubuntu_18_04_x86_64.whl 
simplejson-3.16.0-cp36-cp36m-linux_ubuntu_18_04_x86_64.whl is not a supported wheel on this platform.

Yet this works puzzlingly:

$ pip install --platform linux_ubuntu_18_04_x86_64 --only-binary :all: --find-links "file://$PWD/wheels" simplejson
Looking in links: file:///home/asottile/workspace/pip/wheels
Collecting simplejson
Installing collected packages: simplejson
Successfully installed simplejson-3.16.0

should be a simple fix once I get some more time to look at it

@larstiq
Copy link

larstiq commented Jan 21, 2020

@asottile @chriskuehl we're also struggling with the scripts not being installed with --target when we have a custom platform. Do you have another way of dealing with this or should this issue still be advanced?

@asottile
Copy link
Contributor

yelp was using pip-custom-platform -- though I don't work there any more

another approach is multiple pypi servers, but that's a lot of work

I haven't really touched this in a year unfortunately, sorry :( I do still want to see it happen though

@benbariteau
Copy link

Can confirm that Yelp still does this, but we are considering alternatives.

@McSinyx
Copy link
Contributor

McSinyx commented Jun 6, 2020

Not-so-up-to-date update: there's been a draft for a PEP standardizing this. Slightly related, as written in the draft,

Note in the case of manylinux, a cached local wheel will override the use of a manylinux wheel uploaded later, which would not have been the case for a previously created linux wheel (other OSes would have always used the cached, locally build wheel, even when a different wheel became available).

Which IIUC describe the fact that currently pip does not make use of cached locally built wheels on most GNU/Linux distributions (as well as non-GNU/non-glibc Linux distributions like Alpine or Void[citation needed]) since they are not manylinux-complied.

@pudquick
Copy link

pudquick commented May 6, 2021

I would seem to have a real world use case that's impacted by this currently on the macOS platform.

The pyobjc project has recently published version 7.2 to PyPI. If you're not familiar with the project, the top level module is dynamic in adjusting what its dependencies are and as such is mostly virtual / pure python while it's the sub-modules that contain architecture specific compiled code, the base minimum required of which is actually called pyobjc-core.

If you look at how version 7.0.1 was published, you'll see:
https://pypi.org/project/pyobjc-core/7.0.1/#files

pyobjc_core-7.0.1-cp37-cp37m-macosx_10_9_x86_64.whl
pyobjc_core-7.0.1-cp38-cp38-macosx_10_9_x86_64.whl
pyobjc_core-7.0.1-cp39-cp39-macosx_10_9_x86_64.whl
pyobjc_core-7.0.1-cp39-cp39-macosx_11_0_universal2.whl
pyobjc-core-7.0.1.tar.gz

Running an install on an Intel Mac with Big Sur macOS 11 with python 3.9.2/3.9.4/3.9.5 with pip install --only-binary=:all: pyobjc-core==7.0.1, you'll get pyobjc_core-7.0.1-cp39-cp39-macosx_11_0_universal2.whl installed - a Universal 2 copy of the module that's safely redistributable to both Intel and Apple silicon Macs running python 3.9.2+

If we look at 7.2, though, it was published slightly differently:
https://pypi.org/project/pyobjc-core/7.2/#files

pyobjc_core-7.2-cp310-cp310-macosx_10_9_universal2.whl
pyobjc_core-7.2-cp37-cp37m-macosx_10_9_x86_64.whl
pyobjc_core-7.2-cp38-cp38-macosx_10_9_x86_64.whl
pyobjc_core-7.2-cp39-cp39-macosx_10_9_universal2.whl
pyobjc_core-7.2-cp39-cp39-macosx_10_9_x86_64.whl
pyobjc-core-7.2.tar.gz

Notably the Universal 2 wheels are now tagged with macOS 10.9 instead of 11.

But the end effect is vastly different.

Running an install on an Intel Mac with Big Sur macOS 11 with python 3.9.2/3.9.4/3.9.5 with pip install --only-binary=:all: pyobjc-core==7.2, you'll get pyobjc_core-7.2-cp39-cp39-macosx_10_9_x86_64.whl installed - an Intel-only copy of the module that's not safely redistributable to every CPU architecture that can potentially be running macOS Big Sur.

I went through pip install verbose and didn't find the logical reason why x86_64 was preferred over universal2 in this case, but I'm assuming it's that "more specific is better" and that "a more specific OS version" outweighed "a more specific CPU architecture".

For 7.0.1, there was a macOS 11 choice - but it was only universal2.
For 7.2, all macOS version choices were equal (10.9+) - and both an x86_64 and universal2 option were available.

For the macOS platform going forward, at least, it highlights that there may be multiple possible valid choices - all completely compatible - but without a version of install that supports -some- level of platform filtering when it comes to choices, I can't trivially pick the one that's correct for my use case (in this case: putting together a self-contained environment that can run on any Mac running macOS 11 / Big Sur) and am left to a.) the somewhat mysterious logic of pip combined with b.) the python module publishing choices made by someone trying to fully support Mac.

+1 for a desire to have some sort of filtering capability when there are multiple things I could install and which one is 'right' is really a factor of what it is you're trying to do.

I'll also point out that the --target restriction also seems pretty arbitrary considering that with on extra step, I can effectively perform the desired outcome:

# make some place to put wheels
mkdir ./pyobjc-universal-7.2
# download them, filtered against what I want
python3 -m pip download --only-binary=:all: --platform=macosx_10_9_universal2 --platform=macosx_11_0_universal2 --no-cache-dir pyobjc-core==7.2 --dest ./pyobjc-universal-7.2
# only install using what I downloaded
python3 -m pip install --only-binary=:all: --no-cache-dir --no-index --find-links ./pyobjc-universal-7.2 pyobjc-core==7.2

If I can't have the ability to pick exactly what I do want (even if it's in the possible choices pip has at hand but doesn't decide to use) - can I at least have the option to filter what I don't want?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

10 participants