Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLD: Build wheels using cibuildwheel #48283

Merged
merged 10 commits into from
Sep 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
180 changes: 180 additions & 0 deletions .github/workflows/wheels.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
# Workflow to build wheels for upload to PyPI.
# Inspired by numpy's cibuildwheel config https://github.com/numpy/numpy/blob/main/.github/workflows/wheels.yml
#
# In an attempt to save CI resources, wheel builds do
# not run on each push but only weekly and for releases.
# Wheel builds can be triggered from the Actions page
# (if you have the perms) on a commit to master.
#
# Alternatively, you can add labels to the pull request in order to trigger wheel
# builds.
# The label(s) that trigger builds are:
# - Build
name: Wheel builder

on:
schedule:
# ┌───────────── minute (0 - 59)
# │ ┌───────────── hour (0 - 23)
# │ │ ┌───────────── day of the month (1 - 31)
# │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)
# │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)
# │ │ │ │ │
- cron: "27 3 */1 * *"
push:
pull_request:
types: [labeled, opened, synchronize, reopened]
workflow_dispatch:

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
build_wheels:
name: Build wheel for ${{ matrix.python[0] }}-${{ matrix.buildplat[1] }}
if: >-
github.event_name == 'schedule' ||
github.event_name == 'workflow_dispatch' ||
(github.event_name == 'pull_request' &&
contains(github.event.pull_request.labels.*.name, 'Build')) ||
(github.event_name == 'push' && startsWith(github.ref, 'refs/tags/v') && ( ! endsWith(github.ref, 'dev0')))
mroeschke marked this conversation as resolved.
Show resolved Hide resolved
runs-on: ${{ matrix.buildplat[0] }}
strategy:
# Ensure that a wheel builder finishes even if another fails
fail-fast: false
matrix:
# Github Actions doesn't support pairing matrix values together, let's improvise
# https://github.com/github/feedback/discussions/7835#discussioncomment-1769026
buildplat:
- [ubuntu-20.04, manylinux_x86_64]
- [macos-11, macosx_*]
- [windows-2019, win_amd64]
- [windows-2019, win32]
# TODO: support PyPy?
python: [["cp38", "3.8"], ["cp39", "3.9"], ["cp310", "3.10"], ["cp311", "3.11-dev"]]# "pp38", "pp39"]
env:
IS_32_BIT: ${{ matrix.buildplat[1] == 'win32' }}
IS_PUSH: ${{ github.event_name == 'push' && startsWith(github.ref, 'refs/tags/v') }}
IS_SCHEDULE_DISPATCH: ${{ github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' }}
steps:
- name: Checkout pandas
uses: actions/checkout@v3
with:
submodules: true
# versioneer.py requires the latest tag to be reachable. Here we
# fetch the complete history to get access to the tags.
# A shallow clone can work when the following issue is resolved:
# https://github.com/actions/checkout/issues/338
fetch-depth: 0

- name: Build wheels
uses: pypa/cibuildwheel@v2.9.0
env:
CIBW_BUILD: ${{ matrix.python[0] }}-${{ matrix.buildplat[1] }}
CIBW_ENVIRONMENT: IS_32_BIT='${{ env.IS_32_BIT }}'
# We can't test directly with cibuildwheel, since we need to have to wheel location
# to mount into the docker image
CIBW_TEST_COMMAND_LINUX: "python {project}/ci/test_wheels.py"
CIBW_TEST_COMMAND_MACOS: "python {project}/ci/test_wheels.py"
CIBW_TEST_REQUIRES: hypothesis==6.52.1 pytest>=6.2.5 pytest-xdist pytest-asyncio>=0.17
CIBW_REPAIR_WHEEL_COMMAND_WINDOWS: "python ci/fix_wheels.py {wheel} {dest_dir}"
CIBW_ARCHS_MACOS: x86_64 universal2
CIBW_BUILD_VERBOSITY: 3

# Used to push the built wheels
- uses: actions/setup-python@v3
with:
python-version: ${{ matrix.python[1] }}

- name: Test wheels (Windows 64-bit only)
if: ${{ matrix.buildplat[1] == 'win_amd64' }}
shell: cmd
run: |
python ci/test_wheels.py wheelhouse

- uses: actions/upload-artifact@v3
with:
name: ${{ matrix.python[0] }}-${{ startsWith(matrix.buildplat[1], 'macosx') && 'macosx' || matrix.buildplat[1] }}
path: ./wheelhouse/*.whl

- name: Upload wheels
if: success()
shell: bash
env:
PANDAS_STAGING_UPLOAD_TOKEN: ${{ secrets.PANDAS_STAGING_UPLOAD_TOKEN }}
PANDAS_NIGHTLY_UPLOAD_TOKEN: ${{ secrets.PANDAS_NIGHTLY_UPLOAD_TOKEN }}
run: |
source ci/upload_wheels.sh
set_upload_vars
# trigger an upload to
# https://anaconda.org/scipy-wheels-nightly/pandas
# for cron jobs or "Run workflow" (restricted to main branch).
# Tags will upload to
# https://anaconda.org/multibuild-wheels-staging/pandas
# The tokens were originally generated at anaconda.org
upload_wheels
build_sdist:
name: Build sdist
if: >-
github.event_name == 'schedule' ||
github.event_name == 'workflow_dispatch' ||
(github.event_name == 'pull_request' &&
contains(github.event.pull_request.labels.*.name, 'Build')) ||
(github.event_name == 'push' && startsWith(github.ref, 'refs/tags/v') && ( ! endsWith(github.ref, 'dev0')))
runs-on: ubuntu-latest
env:
IS_PUSH: ${{ github.event_name == 'push' && startsWith(github.ref, 'refs/tags/v') }}
IS_SCHEDULE_DISPATCH: ${{ github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' }}
steps:
- name: Checkout pandas
uses: actions/checkout@v3
with:
submodules: true
# versioneer.py requires the latest tag to be reachable. Here we
# fetch the complete history to get access to the tags.
# A shallow clone can work when the following issue is resolved:
# https://github.com/actions/checkout/issues/338
fetch-depth: 0
# Used to push the built wheels
- uses: actions/setup-python@v3
with:
# Build sdist on lowest supported Python
python-version: '3.8'
- name: Build sdist
run: |
pip install build
python -m build --sdist
- name: Test the sdist
run: |
# TODO: Don't run test suite, and instead build wheels from sdist
mroeschke marked this conversation as resolved.
Show resolved Hide resolved
# by splitting the wheel builders into a two stage job
# (1. Generate sdist 2. Build wheels from sdist)
# This tests the sdists, and saves some build time
python -m pip install dist/*.gz
pip install hypothesis==6.52.1 pytest>=6.2.5 pytest-xdist pytest-asyncio>=0.17
cd .. # Not a good idea to test within the src tree
python -c "import pandas; print(pandas.__version__);
pandas.test(extra_args=['-m not clipboard and not single_cpu', '--skip-slow', '--skip-network', '--skip-db', '-n=2']);
pandas.test(extra_args=['-m not clipboard and single_cpu', '--skip-slow', '--skip-network', '--skip-db'])"
- uses: actions/upload-artifact@v3
with:
name: sdist
path: ./dist/*

- name: Upload sdist
if: success()
shell: bash
env:
PANDAS_STAGING_UPLOAD_TOKEN: ${{ secrets.PANDAS_STAGING_UPLOAD_TOKEN }}
PANDAS_NIGHTLY_UPLOAD_TOKEN: ${{ secrets.PANDAS_NIGHTLY_UPLOAD_TOKEN }}
run: |
source ci/upload_wheels.sh
set_upload_vars
# trigger an upload to
# https://anaconda.org/scipy-wheels-nightly/pandas
# for cron jobs or "Run workflow" (restricted to main branch).
# Tags will upload to
# https://anaconda.org/multibuild-wheels-staging/pandas
# The tokens were originally generated at anaconda.org
upload_wheels
2 changes: 2 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ repos:
rev: 5.0.4
hooks:
- id: flake8
# Need to patch os.remove rule in pandas-dev-flaker
exclude: ^ci/fix_wheels.py
additional_dependencies: &flake8_dependencies
- flake8==5.0.4
- flake8-bugbear==22.7.1
Expand Down
54 changes: 54 additions & 0 deletions ci/fix_wheels.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
import os
import shutil
import sys
import zipfile

try:
_, wheel_path, dest_dir = sys.argv
# Figure out whether we are building on 32 or 64 bit python
is_32 = sys.maxsize <= 2**32
PYTHON_ARCH = "x86" if is_32 else "x64"
except ValueError:
# Too many/little values to unpack
raise ValueError(
"User must pass the path to the wheel and the destination directory."
)
# Wheels are zip files
if not os.path.isdir(dest_dir):
print(f"Created directory {dest_dir}")
os.mkdir(dest_dir)
shutil.copy(wheel_path, dest_dir) # Remember to delete if process fails
wheel_name = os.path.basename(wheel_path)
success = True
exception = None
repaired_wheel_path = os.path.join(dest_dir, wheel_name)
with zipfile.ZipFile(repaired_wheel_path, "a") as zipf:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for using zipfile versus wheel? Not sure what implementation differences there are but likely safer to use wheel unpack

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wheels are zip files underneath. I don't think using wheel is a good idea since wheel has no public Python API, and I don't want to do a bunch of subprocess calls here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea that's a bit unfortunate with wheel. The other consideration point is that the zipfile is mostly an implementation detail of wheel at the moment; may or may not last forever. There's been some talk about this over the past few years

https://discuss.python.org/t/making-the-wheel-format-more-flexible-for-better-compression-speed/3810

There's also apparently some file hashes that wheel takes care of that you wouldn't get this way (saw in wheel documentation). Not sure of the ultimate impacts of that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some interesting discussion here as well

pypa/wheel#262

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can stick with this for now. I'm just not a huge fan of The file hashes might be important, but given that pip installs the wheel without complaint, I'm going to ignore it for now. If anything fails, it will fail loudly in the future.

try:
# TODO: figure out how licensing works for the redistributables
base_redist_dir = (
f"C:/Program Files (x86)/Microsoft Visual Studio/2019/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should maybe consider using a package to do this unless we have in house expertise on Windows libraries. delvewheel might be an option

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last time I tried(~1 year back), delvewheel was doing creating extra directories in the repaired wheels.

Note: We test the pandas wheels on the windowsservercore docker images, which does not have the visual c++ redistributables. Any missing DLLS should fail the job there.

f"Enterprise/VC/Redist/MSVC/14.29.30133/{PYTHON_ARCH}/"
f"Microsoft.VC142.CRT/"
)
zipf.write(
os.path.join(base_redist_dir, "msvcp140.dll"),
"pandas/_libs/window/msvcp140.dll",
)
zipf.write(
os.path.join(base_redist_dir, "concrt140.dll"),
"pandas/_libs/window/concrt140.dll",
)
if not is_32:
zipf.write(
os.path.join(base_redist_dir, "vcruntime140_1.dll"),
"pandas/_libs/window/vcruntime140_1.dll",
)
except Exception as e:
success = False
exception = e

if not success:
os.remove(repaired_wheel_path)
raise exception
else:
print(f"Successfully repaired wheel was written to {repaired_wheel_path}")
58 changes: 58 additions & 0 deletions ci/test_wheels.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
import glob
import os
import platform
import shutil
import subprocess
import sys

if os.name == "nt":
py_ver = platform.python_version()
is_32_bit = os.getenv("IS_32_BIT") == "true"
try:
wheel_dir = sys.argv[1]
wheel_path = glob.glob(f"{wheel_dir}/*.whl")[0]
except IndexError:
# Not passed
wheel_path = None
print(f"IS_32_BIT is {is_32_bit}")
print(f"Path to built wheel is {wheel_path}")
if is_32_bit:
sys.exit(0) # No way to test Windows 32-bit(no docker image)
if wheel_path is None:
raise ValueError("Wheel path must be passed in if on 64-bit Windows")
print(f"Pulling docker image to test Windows 64-bit Python {py_ver}")
subprocess.run(f"docker pull python:{py_ver}-windowsservercore", check=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might also be worth just using the docker python package rather than running all of these in subprocess commands

https://pypi.org/project/docker/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if there will be too much improvement here(have not used that package myself). Looking at the examples, though, (e.g. client.containers.run("ubuntu:latest", "echo hello world")), it seems like the syntax is very similar to a subprocess call.

pandas_base_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
print(f"pandas project dir is {pandas_base_dir}")
dist_dir = os.path.join(pandas_base_dir, "dist")
print(f"Copying wheel into pandas_base_dir/dist ({dist_dir})")
os.mkdir(dist_dir)
shutil.copy(wheel_path, dist_dir)
print(os.listdir(dist_dir))
subprocess.run(
rf"docker run -v %cd%:c:\pandas "
f"python:{py_ver}-windowsservercore /pandas/ci/test_wheels_windows.bat",
check=True,
shell=True,
cwd=pandas_base_dir,
)
else:
import pandas as pd

pd.test(
extra_args=[
"-m not clipboard and not single_cpu",
"--skip-slow",
"--skip-network",
"--skip-db",
"-n=2",
]
)
pd.test(
extra_args=[
"-m not clipboard and single_cpu",
"--skip-slow",
"--skip-network",
"--skip-db",
]
)
9 changes: 9 additions & 0 deletions ci/test_wheels_windows.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
set test_command=import pandas as pd; print(pd.__version__); ^
pd.test(extra_args=['-m not clipboard and not single_cpu', '--skip-slow', '--skip-network', '--skip-db', '-n=2']); ^
pd.test(extra_args=['-m not clipboard and single_cpu', '--skip-slow', '--skip-network', '--skip-db'])

python --version
pip install pytz six numpy python-dateutil
pip install hypothesis==6.52.1 pytest>=6.2.5 pytest-xdist pytest-asyncio>=0.17
pip install --find-links=pandas/dist --no-index pandas
python -c "%test_command%"
42 changes: 42 additions & 0 deletions ci/upload_wheels.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Modified from numpy's https://github.com/numpy/numpy/blob/main/tools/wheels/upload_wheels.sh

set_upload_vars() {
echo "IS_PUSH is $IS_PUSH"
echo "IS_SCHEDULE_DISPATCH is $IS_SCHEDULE_DISPATCH"
if [[ "$IS_PUSH" == "true" ]]; then
echo push and tag event
export ANACONDA_ORG="multibuild-wheels-staging"
export TOKEN="$PANDAS_STAGING_UPLOAD_TOKEN"
export ANACONDA_UPLOAD="true"
elif [[ "$IS_SCHEDULE_DISPATCH" == "true" ]]; then
echo scheduled or dispatched event
export ANACONDA_ORG="scipy-wheels-nightly"
export TOKEN="$PANDAS_NIGHTLY_UPLOAD_TOKEN"
export ANACONDA_UPLOAD="true"
else
echo non-dispatch event
export ANACONDA_UPLOAD="false"
fi
}
upload_wheels() {
echo ${PWD}
if [[ ${ANACONDA_UPLOAD} == true ]]; then
if [ -z ${TOKEN} ]; then
echo no token set, not uploading
else
conda install -q -y anaconda-client
# sdists are located under dist folder when built through setup.py
if compgen -G "./dist/*.gz"; then
echo "Found sdist"
anaconda -q -t ${TOKEN} upload --skip -u ${ANACONDA_ORG} ./dist/*.gz
elif compgen -G "./wheelhouse/*.whl"; then
echo "Found wheel"
anaconda -q -t ${TOKEN} upload --skip -u ${ANACONDA_ORG} ./wheelhouse/*.whl
else
echo "Files do not exist"
return 1
fi
echo "PyPI-style index: https://pypi.anaconda.org/$ANACONDA_ORG/simple"
fi
fi
}