Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to use parallelism when building the wheels? #1088

Closed
Strilanc opened this issue Apr 12, 2022 · 10 comments
Closed

Is there a way to use parallelism when building the wheels? #1088

Strilanc opened this issue Apr 12, 2022 · 10 comments

Comments

@Strilanc
Copy link

Strilanc commented Apr 12, 2022

Waiting for cibuildwheel to finish is really painful, even for a single wheel. Is there a flag I can use in order to force it to use multiple threads when building? For example, when building code with make I can specify --jobs 8 to build 8 source files at a time instead of 1 at a time.

For scale, cibuildwheel is one hundred (!!!!!) times slower than all the other checks I do. Being able to reduce that to 10x slower by using parallelism would be hugely helpful.

@dvarrazzo
Copy link
Contributor

You can use the github action grid, or some equivalent feature in the CI system you use.

I have done it recently on setproctitle and build went down from hours to 10 minutes.

The result is some 30 parallel jobs.

@Strilanc
Copy link
Author

Strilanc commented May 10, 2022

That does help, but the individual jobs still take 10 minutes to complete. When I build the wheel locally using bazel, after clearing bazel's cache, the build and test finishes in 1 minute. So there's still something making things very slow in cibuildwheels compared to other strategies for building the wheel. And a likely candidate is that each worker is only using one thread to build the C++ code, so it goes one... file... at... a... time...

10 minutes on github (the worst one takes 20 minutes but 10 seems to be the common value):

quantumlib/Stim#238

image

1 minute locally (on a 12 core machine):

image

@henryiii
Copy link
Contributor

henryiii commented May 10, 2022

CI runners have at most 2 cores on most CI's, and run on shared resources. So that's 6x slower than local, or the equivalent of 6 minutes local time. Plus cibuildwheel is downloading Python and setting up multiple virtual environments, downloading dependencies, running tests, etc - easily can be the remaining up to 4 minutes. So that sounds perfectly reasonable. Your build should be using both available cores as long as you've set it up to do that. Cibuildwheel in parallel would not give you much at all. Though I guess you could do it yourself with CIBW_BUILD=<first half> cibuildwheel & ; CIBW_BUILD=<second half> cibuildwheel & followed by a wait for the last two launched background jobs. See https://stackoverflow.com/questions/356100/how-to-wait-in-bash-for-several-subprocesses-to-finish-and-return-exit-code-0. I don't think it will be much faster, though.

@Strilanc
Copy link
Author

If you actually look at what is taking time, it is not downloading dependencies or running tests. In one of my "fast" cases, the setup and teardown takes about a minute and the tests also take about a minute. Which leaves 8 minutes of building:

image

I already am only doing one wheel build per worker, so I'll get no benefit from two invocations. What I need is for cibuildwheel to use two cores for one build, so that it is not processing one C++ file at a time.

I did realize that my worse offenders are actually building numpy and pandas, in addition to my wheel, which is why they take like 40 minutes instead of 10-20. The actual time spent building my wheel is 12.5 minutes. Maybe I should go poke numpy and pandas to have wheels for cp39-manylinux_i686, or decide to stop supporting it myself.

@henryiii
Copy link
Contributor

At least numpy and pandas do build in parallel. Setuptools doesn't directly support it (unless you have multiple extensions), but it's easy to patch in, pybind11 and numpy both have utilities for it (and of course CMake via Scikit-build, etc. all support it). As long as you are doing that, you aren't waisting that much time not running cibuildwheel in two threads.

Yes, if you provide wheels for something your dependencies don't, not sure it's very helpful. Users will have to build numpy & pandas to use your packaged binary. Also make sure you are using the same manylinux family they are using or better. (like manylinux2014 for Python 3.10, which we do default to these days). If you go older, you'll have to build them if you use them.

@joerick
Copy link
Contributor

joerick commented Jun 19, 2022

I don't think currently that it makes sense to provide build parallelism within cibuildwheel. There are already options to do it at a lower level (compiler flags) or at a higher level (CI build matrices). Adding it in cibuildwheel is probably not going to be worth the complexity.

@joerick joerick closed this as not planned Won't fix, can't repro, duplicate, stale Jun 19, 2022
@joerick
Copy link
Contributor

joerick commented Jun 19, 2022

One thing that potentially could improve build performance would be to be a bit cleverer about the network IO. E.g. we could download the next docker image or Python version while the previous build is running. That might save a little time. But again, would it be worth the added complexity? Not sure...

@ddelange
Copy link

ddelange commented Sep 17, 2022

I like the high level (CI build matrices) solution, I can split out arch and linux flavour like below. But since QEMU is painfully slow for aarch64, cibuildwheel runs a bunch of tests on the wheels after building, the wait for aarch64 to finish on the below concurrency 2 is still resulting in hours of CI.

Solution is to additionally split out python versions. I would however prefer not to hardcode python versions in CI, rather to let the cibuildwheel python_requires='>=3.6' mechanics determine which wheels to build.

Maybe there is a neat way of introducing the possibility for further concurrency other than implementing parallellism inside cibuildwheel? Most runners only have 2 CPU cores anyway so that would not give a huge speedup...

Edit: found a more or less neat way: updated snippet below to evenly distribute current and future python releases over 5 build jobs. Assuming a package that supports 3.6+ (and stable cibuildwheel currently distributing up to cp311-*), there won't be any runners doing empty work with the setup below. Even neater would be to distribute a single forward-compatible py36+ (ABI3) wheel like giampaolo/psutil#2103, but that isn't currently feasible for my particular project.

# gh actions
jobs:
  build-wheels:
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        include:
          - {os: macos-latest, arch: x86_64, build: "*"}
          - {os: macos-latest, arch: arm64, build: "*"}
          - {os: windows-latest, arch: AMD64, build: "*"}
          - {os: windows-latest, arch: x86, build: "*"}          
          - {os: ubuntu-latest, arch: x86_64, build: "*"}
          - {os: ubuntu-latest, arch: aarch64, build: "*[61]-manylinux*"}
          - {os: ubuntu-latest, arch: aarch64, build: "*[72]-manylinux*"}
          - {os: ubuntu-latest, arch: aarch64, build: "*[83]-manylinux*"}
          - {os: ubuntu-latest, arch: aarch64, build: "*[94]-manylinux*"}
          - {os: ubuntu-latest, arch: aarch64, build: "*[05]-manylinux*"}
          - {os: ubuntu-latest, arch: aarch64, build: "*[61]-musllinux*"}
          - {os: ubuntu-latest, arch: aarch64, build: "*[72]-musllinux*"}
          - {os: ubuntu-latest, arch: aarch64, build: "*[83]-musllinux*"}
          - {os: ubuntu-latest, arch: aarch64, build: "*[94]-musllinux*"}
          - {os: ubuntu-latest, arch: aarch64, build: "*[05]-musllinux*"}
    steps:
      - uses: docker/setup-qemu-action@v2
        if: matrix.os == 'ubuntu-latest'
      - uses: pypa/cibuildwheel@v2.10.0
        env:
          CIBW_BUILD_VERBOSITY: 1
          CIBW_ARCHS: ${{ matrix.arch }}
          CIBW_BUILD: ${{ matrix.build }}

@henryiii
Copy link
Contributor

See https://iscinumpy.dev/post/cibuildwheel-2-10-0/#only-210 or #1261.

@henryiii
Copy link
Contributor

(Though that's kind of neat too)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants