Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multi-platform pip download to pip_parse #510

Closed
wants to merge 2 commits into from

Conversation

meastham
Copy link

@meastham meastham commented Aug 14, 2021

PR Checklist

Please check if your PR fulfills the following requirements:

  • Does not include precompiled binaries, eg. .par files. See CONTRIBUTING.md for info
  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)

PR Type

What kind of change does this PR introduce?

  • Bugfix
  • Feature (please, look at the "Scope of the project" section in the README.md file)
  • Code style update (formatting, local variables)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Other... Please describe:

What is the current behavior?

There is no multi-platform support: #260

What is the new behavior?

Adds a new argument, pip_platform_definitions, to pip_parse. The value of the argument looks like this:

    pip_platform_definitions = {
        "//platforms:linux_x86_64_build": "manylinux2014_x86_64-37-cp-cp37m",
        "//platforms:macos_x86_64_build": "macosx_10_14_x86_64-37-cp-cp37m",
    },

Each key is a config_setting, and each value is a string of the fom <platform>-<python_version>-<implementation>-<abi>.

When pip_platform_definitions is specified, instead of running pip wheel, the rules run pip download with constraints extracted from the values in the map values. This makes it possible to do cross-platform Python builds, e.g. on MacOS run bazel with --platforms=//platforms:macos_x86_64 and have linux dependencies fetched properly. In the above example, we would end with 3 targets for a pip package called foo:

  • @pip_pypi__macosx_10_14_x86_64_37_cp_cp37m__foo//:pkg is the py_library for the MacOS version of the package.
  • @pip_pypi__manylinux2014_x86_64_37_cp_cp37m__foo//:pkg is the py_library for the Linux version of the package.
  • @pip_pypi__foo is an alias with its actual determined by a select over the provided config_settings pointing at the above two targets. This is the target that the requirement macro points at.

Of course this only works if all of the required packages have wheels available which match the configured platform constraints, either in PyPI or another configured repository.

Does this PR introduce a breaking change?

  • Yes
  • No

Other information

In its current state, this PR is more of a proof of concept that is intended to start a conversation rather than something that is ready to merge. If I get feedback from the maintainers that this general approach is something that could be merged, then I would be happy to do the work to get it into a better state.

We've been successfully been using this at my company to build Docker images with native Python dependencies (e.g. pandas, gRPC) on MacOS. We have a reasonably large collection of Python deps (124), and in practice it has not been a large problem to obtain wheels for all of them. All except 3 are already in PyPI, and for the 3 that are missing it's possible to build universal wheels, which we just do manually outside of Bazel.

Things that I think likely need improvement before this is ready to merge are:

  • The interface to this new functionality is clunky. It's basically the fastest thing I could get working to see if the approach would work at all. I think the right way may be to create a toolchain_type for this purpose, and put the pip download constraints in a toolchain provider, but have not investigated this thoroughly.
  • Documentation
  • Examples/Tests
  • Some general code cleanup. The new interface for repo_names_and_requirements is pretty confusing.
    Feedback on any of these items would be great!

 Please enter the commit message for your changes. Lines starting
@meastham meastham changed the title Add support for multi-platform pip download to pip_parse Add support for multi-platform pip download to pip_parse Aug 14, 2021
@meastham meastham changed the title Add support for multi-platform pip download to pip_parse Add support for multi-platform pip download to pip_parse Aug 14, 2021
@mattem
Copy link
Collaborator

mattem commented Aug 16, 2021

As an overall feature I think this is okay, but I'd be careful in the cross / multi platform wording.

The general issue is that pip tools (the rule that powers pip_compile) doesn't produce a locked requirements file that is cross platform, so while you're getting the Linux wheels for what's listed, any dependencies that have a platform constraint next to them in the input requirements file (or transitive requirements) that don't match the platform the lock file was generated on will be missing from the lock file, at which point you could be missing requirements for a given platform except the one it was locked on.

I guess what I'm trying to say is, this is constrained by the locked requirements file as the input.

@meastham
Copy link
Author

As an overall feature I think this is okay, but I'd be careful in the cross / multi platform wording.

The general issue is that pip tools (the rule that powers pip_compile) doesn't produce a locked requirements file that is cross platform, so while you're getting the Linux wheels for what's listed, any dependencies that have a platform constraint next to them in the input requirements file (or transitive requirements) that don't match the platform the lock file was generated on will be missing from the lock file, at which point you could be missing requirements for a given platform except the one it was locked on.

I guess what I'm trying to say is, this is constrained by the locked requirements file as the input.

Yes, you're 100% correct that that is a gotcha that you have to be on the lookout for. I think this can already affect users of pip_compile if they're using the same lock file on multiple host platforms though right?

Also I see that you're a collaborator, do you know how I can find out if I would be able to get this merged if I fix up the list of issues in the PR description?

@thundergolfer thundergolfer requested review from hrfuller and thundergolfer and removed request for brandjon and lberki September 14, 2021 09:21
@@ -178,6 +183,11 @@ pip_repository_attrs = {
default = False,
doc = "Create the repository in incremental mode.",
),
"pip_platform_definitions": attr.label_keyed_string_dict(
doc = """
A map of select keys to platform definitions in the form <platform>-<python_version>-<implementation>-<abi>"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A map of select keys to platform definitions in the form <platform>-<python_version>-<implementation>-<abi>"
A map of select keys to Python platform definitions in the form <platform>-<python_version>-<implementation>-<abi>"

Just a nit, but helpful to distinguish from a Bazel platform.

)
pip_args = [sys.executable, "-m", "pip", "--isolated"]
if args.pip_platform_definition:
platform, python_version, implementation, abi = args.pip_platform_definition.split("-")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: why this format instead of the format listed here in PEP 425?

{python tag}-{abi tag}-{platform tag}

The extra bit in this PR's string is {implementation}, cp = CPython, but this is covered by the {python tag} from PEP 425.

The Python tag indicates the implementation and version required by a distribution.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because pip download takes four flags: --platform, --python-version, --implementation, --abi. I'm not sure why pip doesn't line up with the PEP 425 format.

@thundergolfer
Copy link

thundergolfer commented Sep 14, 2021

Python packaging is hell, isn't it? Appreciate the PR and the context you've provided in "other information". I think the biggest roadblocks to supporting cross-platform this way are the sharp-edges/pitfalls users have to navigate:

  • Package support becomes 'wheel only'. How is this communicated? Ref: "Of course this only works if all of the required packages have wheels available which match the configured platform constraints"
  • A single locked requirements.txt file is only really valid on the platform that produced it. So do we support multiple files? @alexeagle has a draft out for this now https://github.com/bazelbuild/rules_python/pull/531/files
  • Feels strange to allow providing strings which may include requirements (eg. python 3.10) which relate to no toolchain in the workspace. You touch on this here: "I think the right way may be to create a toolchain_type for this purpose". If I'm going to download third-party build artefacts that become inputs to some py_library/binary/test, they ought to only be build artefacts that match up to a Python runtime defined in the workspace.

But generally I think moving to a 'download only' install phase is solid.

@aignas
Copy link
Collaborator

aignas commented Oct 27, 2021

It is interesting that I've found this as this would be useful for me. Currently I am setting up a bazel repository and the setup is that most of the developers are using Macs where as our CI is running on Ubuntu20.04, so wheel only installation would be something that would be really good to be able to use. Otherwise, the non-hermetic setup.py files, which require various CLI tools and/or libraries on the host machine need to be present in my Ubuntu20.04 CI image.

My further comments:

  • pip_parse currently accepts a Python interpreter label. It would be great if we could use the Python platform from that interpreter. This is a nice to have and it might be great to not block on such feature for the first iteration.

@NathanHowell
Copy link

Currently I am setting up a bazel repository and the setup is that most of the developers are using Macs where as our CI is running on Ubuntu20.04, so wheel only installation would be something that would be really good to be able to use.

poetry and pipenv-resolve, if not others, support a single lock file across platform for (most? all?) wheels... worth considering if you have wheel only dependencies. it'd be nice if pip-compile added an option for the same. https://github.com/soniaai/rules_poetry has Poetry rules for Bazel (note: I'm the original author).

@meastham
Copy link
Author

meastham commented Jan 9, 2022

Thanks for the replies and sorry for the long delay on this!

This is a good point, but in practice I've found it not cause very many problems in practice with our repo. For the few cases where we have platform-specific transitive dependencies I've just added them as unconditional top level dependencies in our requirements.in. We haven't run into the case where there's a platform-specific dep that isn't available on other platforms; in that case you would have to do some hacky like manually create a dummy empty wheel for that dep and include it in an extra index.

Do you think coming up with a better solution to this problem would block landing this or could we just note it as a caveat when documenting the feature?

  • Feels strange to allow providing strings which may include requirements (eg. python 3.10) which relate to no toolchain in the workspace. You touch on this here: "I think the right way may be to create a toolchain_type for this purpose". If I'm going to download third-party build artefacts that become inputs to some py_library/binary/test, they ought to only be build artefacts that match up to a Python runtime defined in the workspace.

I dug into how this would look, and I'm not really sure how much of a practical benefit there would be. py_runtime/py_runtime_pair don't contain the necessary information, so I think we would need to use a separate parallel "pip toolchain." Additionally, toolchains aren't available to repository_rules, which is where we need to run pip download. That would mean we would need a list of all of the "pip toolchains" passed in to pip_parse (which is not how toolchains usually work), and then like the current implementation we would download and extract all of the candidate wheels at repository_rule time and later use the resolved toolchain to select the correct version at build time. This seems pretty awkward and introduces new error modes (i.e. not passing all of the registered toolchains to pip_parse), so it doesn't really seem better to me.

Does that analysis seem correct? Is there a better way I'm missing?

@groodt
Copy link
Collaborator

groodt commented Aug 27, 2022

Decided to go ahead with this simpler approach, that has a similar outcome. #773

@groodt groodt closed this Aug 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants