Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper (cross) compilation of pip packages #453

Closed
ajbouh opened this issue Apr 13, 2021 · 7 comments
Closed

Proper (cross) compilation of pip packages #453

ajbouh opened this issue Apr 13, 2021 · 7 comments
Labels
Can Close? Will close in 30 days if there is no new activity type: feature request

Comments

@ajbouh
Copy link

ajbouh commented Apr 13, 2021

🚀 feature request

I want to use pip packages that require compilation to install. I'd also like to use Bazel's platform transitions and a hermetic cross-compilation toolchain to easily build pip packages for any (linux) target.1

Relevant Rules

pip_parse

Describe the solution you'd like

I expect we'd need to adjust pip_parse logic to do wheel extraction (or compilation) during Bazel's execution phase.

If we can switch pip_parse to generate BUILD files that do the wheel extraction / pip install in a genrule or other non-repository rule, then we can use the toolchain Bazel has selected for the target platform. Note that in the non-cross-compilation case this would just be the host platform cc toolchain.

There are two ways to get access to the current cc toolchain:

Describe alternatives you've considered

  • Leave package installation / compilation in Bazel's loading phase and use repository rules workarounds to find and set CC and CFLAGS. This is feasible but slower (since repository rules do not benefit from fine-grained parallelism), harder to maintain, and kind of a mess.

  • Can also continue with the status quo of manually porting non-wheel-based pip packages over to build with Bazel. This is very challenging, time consuming, and error prone. Doubly so in the context of cross compilation.

1: I have been building a hermetic cross-compilation toolchain for Bazel that's relatively small and straightforward to use. It patches rules_docker to use platform transitions for targets that are used inside the cc_image. Here's a preview: https://github.com/ajbouh/bazel-zig-cc.

@groodt
Copy link
Collaborator

groodt commented Apr 13, 2021

If you've not seen them already, the dbx_build_tools rules from Dropbox are currently the most sophisticated in terms of building third-party Python code from source using toolchains registered with Bazel. Python also often requires Fortran to be setup if I recall for many of the Numerical / Scientific libraries, but the Dropbox rules should make this clearer.

When you talk about "cross-compilation", are you talking about cross-compilation across CPU architectures (x86 vs ARM etc) or also across OS (win, macos, linux)? The reason I ask is because something that should be considered is that it is relatively common to come across "platform specific" dependencies. PEP 508 defines so called "Environment Markers" that change the dependency graph based on the system where the installation is happening. One example from Jupyter is https://github.com/ipython/ipython/blob/master/setup.py#L212.

Python does not currently have strictly separate resolve, build, install steps. It is the "resolve + build" step that is currently coupled, but Python is taking some steps towards making as much of this "static" as possible. However, in the vast majority of cases, setup.py still needs to execute and this is dynamic behavior. Perhaps a workaround for this would be to run the "resolve" step for all desired target platforms and somehow select the appropriate dependency graph at Bazel build time.

@ajbouh
Copy link
Author

ajbouh commented Apr 13, 2021

If you've not seen them already, the dbx_build_tools rules from Dropbox are currently the most sophisticated in terms of building third-party Python code from source using toolchains registered with Bazel. Python also often requires Fortran to be setup if I recall for many of the Numerical / Scientific libraries, but the Dropbox rules should make this clearer.

Thanks, I've recently become aware of them but have not yet explored what they're capable of. Thanks for the pointer re: Fortran.

When you talk about "cross-compilation", are you talking about cross-compilation across CPU architectures (x86 vs ARM etc) or also across OS (win, macos, linux)? The reason I ask is because something that should be considered is that it is relatively common to come across "platform specific" dependencies. PEP 508 defines so called "Environment Markers" that change the dependency graph based on the system where the installation is happening. One example from Jupyter is https://github.com/ipython/ipython/blob/master/setup.py#L212.

I mean cross OS and cross architecture. For now I'm installing pre-compiled wheels by specifying a number of additional pip arguments:

  "--only-binary", ":all",
  "--platform=manylinux2014_x86_64",
  "--platform=manylinux2010_x86_64",

Python does not currently have strictly separate resolve, build, install steps. It is the "resolve + build" step that is currently coupled, but Python is taking some steps towards making as much of this "static" as possible. However, in the vast majority of cases, setup.py still needs to execute and this is dynamic behavior. Perhaps a workaround for this would be to run the "resolve" step for all desired target platforms and somehow select the appropriate dependency graph at Bazel build time.

Right now I work around this by specifying different pip repositories for each platform and then a sort of "meta" requirement macro that looks like this:

load("@pypi_manylinux2010_x86_64//:requirements.bzl", linux_requirement = "requirement")
load("@pypi_macosx_x86_64//:requirements.bzl", darwin_requirement = "requirement")

def requirement(name):
  return select({
    "@platforms//os:linux": [linux_requirement(name)],
    "@platforms//os:macos": [darwin_requirement(name)],
    "//conditions:default": [],
  })

My approach seems to work fine for precompiled wheels, but I'm now seeing the need to extend rules_python so that it works for packages that require compilation.

@groodt
Copy link
Collaborator

groodt commented Apr 14, 2021

The Python packaging ecosystem is always evolving, so in some sense it might be wise to take a "wait and see" approach until things have stabilised. Of course, things may never settle. :)

For example, there is an evolving standard in PEP-517. This clarifies the responsibilities of the so called "build front-end" and "build back-end".

Any "package author" is able to define and configure how their package builds from the package's source tree through their "build back-end", but package "end-users" are responsible for selecting and configuring the "build front-end".

It might be possible someday for Bazel rules_python to become a compliant PEP-517 "build front-end" if this standard is accepted and the ecosystem accepts it. However, what does that then mean if a package author wishes to use flit or other tools to build the wheel, instead of pip? Interesting times ahead I think.

My approach seems to work fine for precompiled wheels, but I'm now seeing the need to extend rules_python so that it works for packages that require compilation.

Not sure if you've considered it, but another option for your use case, if they are always third-party dependencies you are looking to compile would be to build (precompile) the wheels for your target platforms and host the artifacts in your own mirror (artifactory etc). Then Bazel can always work against prebuilt wheels which it simply needs to download and unpack.

@thundergolfer
Copy link

Related issue: #260 ("Cross compilation" of py_binary/py_image/py_library targets)

@github-actions
Copy link

github-actions bot commented Nov 5, 2021

This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days.
Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!

@github-actions github-actions bot added the Can Close? Will close in 30 days if there is no new activity label Nov 5, 2021
@github-actions
Copy link

github-actions bot commented Dec 6, 2021

This issue was automatically closed because it went 30 days without a reply since it was labeled "Can Close?"

@github-actions github-actions bot closed this as completed Dec 6, 2021
@arunkant
Copy link

arunkant commented Apr 27, 2023

Hi @ajbouh, I'm trying to replicate your setup in my org. I am getting below error

ERROR: /home/arunkant/code/qureai/packages/python/image_manager/BUILD:6:11: //packages/python/image_manager:image_manager: expected value of type 'string' for element 5 of attribute 'deps' in 'py_library' rule, but got select({":cpu": ["@python_deps_cpu_django//:pkg"], "//conditions:default": ["@python_deps_gpu_django//:pkg"]}) (select)

As I understand that requirement is a macro and it is not being evaluated correctly. Can you share a working example how you are using it? TIA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Can Close? Will close in 30 days if there is no new activity type: feature request
Projects
None yet
Development

No branches or pull requests

4 participants