-
Notifications
You must be signed in to change notification settings - Fork 7
Setup cibuildwheel with Linux, macOS, Windows wheels #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
You can see a run of this workflow on my fork: https://github.com/alugowski/python-suitesparse-graphblas/actions/runs/4119959220/jobs/7115370559 Example of what
|
cmake .. -DCMAKE_BUILD_TYPE=Release | ||
make -j$(nproc) | ||
|
||
# Disable optimizing some rarely-used types for significantly faster builds and significantly smaller wheel size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If experimenting with the workflow I recommend disabling all the types. That brings the GraphBLAS build times into the 15 minute range on GitHub Actions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a non-documented feature that you can use for this. Turn on the CMAKE_CUDA_DEV flag in the GraphBLAS CmakeLists.txt. That disables all compilation of the kernels in Source/Generated2.
I don't recommend that for production so I don't document it. The resulting GraphBLAS will be pretty slow. But it's nice for development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DrTimothyAldenDavis I thought I was going crazy because -DCMAKE_CUDA_DEV=1
didn't seem to make a difference, but I just noticed CMakeLIsts.txt has set ( CMAKE_CUDA_DEV off )
on lines 56 and 61, so CMAKE_CUDA_DEV
is only usable if one is able to edit CMakeLists.txt directly. I supoose it's possible to do with sed
, but sed
inplace support varies between OSes and such edits are a PITA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DrTimothyAldenDavis Just FYI, compiling that combinatorial explosion of types really does make working with GraphBLAS very difficult. This PR would not happen if GitHub's accounting wasn't broken, as the CI time should have cost me several hundred dollars. With the types disabled as seen here, building just the macOS wheels one time costs about $3.50 according to GitHub's published price list ($5 for all OSes). I'm estimating well over $20/run for all types enabled. I don't mind because GitHub doesn't have my credit card and they aren't counting anything against my free quota, but that's just luck on our part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's intentional. It is not meant for anyone else to use but me. It is supposed be very hard to turn on GBCUDA_DEV, because turning it on slows down GraphBLAS immensely (run time, not compile time).
If it's easy to turn on GBCUDA_DEV, then I worry that it would be compiled that way for production and distribution, as in a linux distro or conda package. That would be awful.
If you like, you could instead compile with
-DGxB_NO_BOOL=1 -DGxB_NO_INT8=1 -DGxB_NO_INT16=1
... (etc).
I do not recommend that for a production build.
See GraphBLAS/Source/GB_control.h:
Note that I already disable quite a few semirings via that file.
GraphBLAS requires a surprisingly large amount of semirings and operators when used in production.
I'm working on a JIT where I can compile these at run-time, however. That will keep the performance up (and even improve it) while at the same time reducing the initial compile time. Distros will have to package my source code for that to work, however.
✨ This is amazing! Thank you so much @alugowski. ✨ I will try to take a closer look this weekend (sooner if I'm able), and I'll try to figure out the version issue and whether this now supports complex on Windows. I would love to learn how you iterated and debugged this. I also think it's interesting you're not using Is everything compiled with OpenMP enabled? |
Sounds good! If a solution is elusive then a fallback fix might be to just use |
Yes. That was an adventure in itself. Linux (GCC) and Windows (MinGW) come with OpenMP and GraphBLAS discovers it automatically. So that's easy. macOS is tricky because for reasons I don't know their clang is famously missing OpenMP. Generally the two options are:
I generally take the first approach because it's simpler for C++ projects, but it didn't work here. First there was some library mismatch when assembling the wheel that I don't remember the details of. Second is a comment from the cibuildhweel folks who push you to use the default compiler on each platform. Apart from lots of possible issues that you're on your own for, they point out that you'll face difficulty with some channels, like conda, that will only use the default compiler. So I had to figure out how to get that |
Excellent.
Oh, I can imagine all to well. You have no idea how happy this PR makes me :) . I really, really, really appreciate the effort and pain you put into this. How would you like to be recognized for this effort? To see if Windows now supports complex, we can simply delete these lines (I don't think we need the ARM-specific file anymore either): python-suitesparse-graphblas/suitesparse_graphblas/build.py Lines 32 to 35 in 86cb2f1
We'll know whether complex works if it compiles and passes tests (which I see also get run in the GitHub Action--hooray!). |
What made it easier is that I had a working workflow from another project I was working on recently (a threaded Matrix Market loader. About 20x faster than SciPy's, maybe I'll make a PR for that). So I started with that. Then you get Linux working (since it should already work), then macos, then Windows. Well, once you're in the steps that require building GraphBLAS you push a change to each OS each time as you have to wait 15 minutes to see what silly thing kills it that time. To speed things up add a Then just tackle build issues as you go. The general flow came from the already working examples, "just" need to modify for this project. This is actually why I took this on, since it's fresh in my mind and you guys need it :) All the options are very well written up in the cibuildwheel docs: https://cibuildwheel.readthedocs.io/en/stable/options/ cibuildwheel has a Tests are very important in this process because they confirm things work, especially with a language like Python that has no static checks. Tests with good coverage can expose many platform-specific issues. This repo's tests took a while to work ( Similarly the necessity of the entire As for why not use the cibuildwheel action, my thoughts are thus.
|
Ah. I can try that. |
Quick question: what version of numpy is being used when building the wheel? When building, we should probably install |
Looks like it's using 1.24.2:
Yeah |
Rebased on updated main. It'll have to be rebased again when #68 is merged, but it'll be easy. |
The new version stuff works, see the results in Test PyPI: https://test.pypi.org/project/suitesparse-graphblas/#files Try it: pip install cffi numpy
pip install --index-url https://test.pypi.org/simple/ suitesparse-graphblas Test PyPI is often missing packages, so install dependencies from regular PyPI The version in Test PyPI is currently 0.0.1 because that's what got assigned when I was working in a branch. I'm now running a build from my main branch which should result in a proper version number. Build takes a few hours, this is the job: https://github.com/alugowski/python-suitesparse-graphblas/actions/runs/4153374973 |
cffi version bump to v1.11 because that's when they started supporting C99 complex types. |
pyproject.toml
Outdated
@@ -45,7 +45,7 @@ classifiers = [ | |||
] | |||
dependencies = [ | |||
# These are super-old; can/should we update them? | |||
"cffi>=1.0.0", | |||
"cffi>=1.11", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating this. Can we also set the min version for cffi in [build-system.requires]
at the top of the file?
I just tested these wheels in python-graphblas/python-graphblas#385 Linux and Windows work! OS X is complaining about OpenMP, which may not be surprising since everything else is installed from @jim22k, can you take a couple of minutes to try installing and using these OS X wheels on your new Mac? Thanks!
@alugowski I'll keep reviewing this PR tomorrow, and I'll research OpenMP with wheels on OS X. |
Alright, in
I don't know if this is advisable, but CI is now passing! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks incredible. I have a couple more things I want to look at more closely.
Is artifact size going to be a problem with actions/upload-artifact
?
name: Build SDist | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add
with:
fetch-depth: 0
to make sure it can always find a tag to use as the version.
- name: Build SDist | ||
run: pipx run build --sdist | ||
|
||
- name: Check metadata | ||
run: pipx run twine check dist/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the versions of pip
, build
, and twine
used here? Should we always update them such as with
python -m pip install --upgrade pip
python -m pip install build twine
or the pipx
equivalent (note: I don't use or really understand pipx
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK you always get the latest.
run: | | ||
pip install twine | ||
twine upload dist/*-manylinux*.whl | ||
- uses: actions/checkout@v3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here:
with:
fetch-depth: 0
- uses: actions/upload-artifact@v3 | ||
with: | ||
path: wheelhouse/*.whl | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sanity check, maybe add
if-no-files-found: error
suitesparse_graphblas/build.py
Outdated
# suitesparse.sh installs GraphBLAS.h here | ||
include_dirs.append(os.path.join("C:\\", "Program Files (x86)", "include")) | ||
# graphblas.lib and graphblas.dll.a | ||
library_dirs.append(os.path.join("C:\\", "Program Files (x86)", "lib")) | ||
# graphblas.dll | ||
library_dirs.append(os.path.join("C:\\", "Program Files (x86)", "bin")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not set these by default? Can they instead be set by changing environment variables or flag or optional argument? Not every redistribution will install to these locations when building, so I don't want to hard code these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. CMakeLists installs to ${SUITESPARSE_LIBDIR}
(and inc/bin) but I can't find where those values are set. Maybe they aren't and those are the default paths.
I know SuiteSparse has options for setting the install location, do you know the preferred way for GraphBLAS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know SuiteSparse has options for setting the install location, do you know the preferred way for GraphBLAS?
I'm not sure I understand the question. What's the difference between SuiteSparse
and GraphBLAS
here?
When running cmake to build SuiteSparse:GraphBLAS, I think you can set the following variables (I'm not an expert on this):
CMAKE_INSTALL_PREFIX
CMAKE_INSTALL_LIBDIR
CMAKE_PREFIX_PATH
CMAKE_LIBRARY_OUTPUT_DIRECTORY
CMAKE_RUNTIME_OUTPUT_DIRECTORY
CMAKE_ARCHIVE_OUTPUT_DIRECTORY
This is what the conda-forge recipe for Windows does:
https://github.com/conda-forge/graphblas-feedstock/blob/main/recipe/bld.bat
When building python-suitesparse-graphblas
, I think the usual environment variables for searching for libraries may work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, the SuiteSparse config step has an option to set the install prefix that applies to all the libraries. You don't have to set those 5 variables individually.
The usual variables seem to work fine on Linux and macOS, but the ones you see in build.py
are the defaults on Windows. I don't think Windows has system-wide "lib" and "include" like Unix. If you have a preference let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. I'm happy to follow your lead here.
Perhaps we can keep what we have and address it later if there are any issues.
We'll do an alpha release first to give us an opportunity to try everything out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm testing a rebase that installs to "C:\GraphBLAS". Feels cleaner and doesn't make path assumptions.
I think Alpha is a good idea. Would you prefer if this PR didn't override the existing wheel.yml
and instead made an independent one? That way you don't have to commit to this one, and you can switch from Test PyPI when you're comfortable.
We can also coordinate about how to delete the Test PyPI. I think it auto deletes after 30 days, but I can delete earlier so you can claim the package ID. You have to make another account and API token, they don't share with regular PyPI. I uploaded mostly as a proof that this workflow works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm testing a rebase that installs to "C:\GraphBLAS". Feels cleaner and doesn't make path assumptions.
Cool, sounds great.
I say override wheel.yml
. I'm also happy to let you "own" Test PyPI ;) . Can you try adding me (eriknw
) to the project?
I suppose we should probably do what scikit-learn
does in their publish_pypi.yml
which is to have a workflow variable to control to which repo to upload to enable easier testing. Up to you. I think our normal workflow will be for wheels to automatically build and upload to PyPI whenever we to a tag/release, and we'll probably do alpha releases when anything with the build system changes.
I think the current sizes are fine, but this is the second time the artifact upload step has failed (with two different errors). So maybe it's worth trying an auto-retry mechanism. If this is an ongoing issue then we could have each build step upload directly to PyPI. This multi-step approach does have its own benefits though, it keeps you from publishing broken wheels if only one platform has issues. |
Cool. When we're about ready to merge, I would like to re-enable the currently disabled dtypes if possible. I think integer dtypes are important. If this causes timeouts for some platforms, maybe we can disable some types on some platforms such as ARM. @alugowski what do you think? |
IIRC the macOS wheel job would stall with all types enabled. By "current" I mean the 22MB of the config in this PR. I believe the full types are over 70MB. I don't know what the GitHub limits are. Up to you. I don't see the value in the types I left disabled. Python does not have unsigned integers, so a Python user is highly unlikely to ever touch those. It's possible to create an unsigned integer via NumPy ctypes, but someone would have to really go out of their way. And if they do then everything will still work. Similarly any integer created in Python will be an int64, so the smaller ones are also unlikely. In my opinion the smaller size makes the library more accessible and easier to work with, and that's more valuable than optimizing those particular types. Hopefully this will become moot with the JIT. |
I'm glad to hear you'll be around to help get wheels working with the upcoming JIT ;) Data has integers of different sizes and signed/unsigned, so data in Python will too. If you're performing an algorithm on a very large matrix, the dtype size can be the difference between whether it fits in memory or not.
Very possible! I'm happy to go with your preference here. If a particular type becomes important for a user or benchmark, we can add it then. |
One more thing: @alugowski do you want to add yourself as an author in |
For choosing which types to not compile, how about @jim22k, if you were to exclude some dtypes from compilation to improve build time and reduce wheel size, which ones would you choose? Note that the dtypes will still work, but they will be slower. fyi, the size of the (compressed) conda packages we build on conda-forge range between 12-21MB, and these use the default settings: https://anaconda.org/conda-forge/graphblas/files |
You can also be more fine-grained and use a revised GB_control.h file, or add a longer set of -D options. For example, you could disable all of FC32, but keep FC64. Then just enable the plus and times operators on FC64, nothing else (and maybe the ones the extract real / imag parts, and the binary op that constructs a complex z from real x and real y, as z = x + i*y. That would give you most of the operations people might need on FC64. |
I'm running basic tests on the rebase, will hopefully have it pushed later tonight. There is one more macOS issue. It looks like the ARM wheel is missing both libomp.dylib and _graphblas.dylib (notice the size difference between the arm and x86 wheel files). I think that's because the x86 runner only has the x86 versions of the library. We'll have to figure out how to have it include both. GitHub Actions does not have arm macOS runners yet, so we can't just use one and have it done for us (like on Linux). Homebrew has ARM builds for macOS, and I'm sure there's a simple switch for clang to build both architectures. |
@eriknw @alugowski What is the status of this? Does it need more testing or is macOS ARM blocking the ability to move forward? |
@alugowski, we--and eager users--would love to get this in. If we skipped MacOS on ARM for now, is everything else good to go? I'd be happy to do an alpha release or two if/when you think it's ready. Is openmp the problem on macos-arm, or are there issues with not cross-compiling correctly? I don't yet have time to dive deeply on this to help, but maybe @jim22k could if you pointed him in the right direction. Thanks! |
Superseded by #73 which incorporates these changes. Thanks @alugowski for helping us push cibuildwheel forward. |
This is a working
cibuildwheel
workflow for building python-suitesparse-graphblas wheels for:manylinux
andmusllinux
It took many many iterations and much debugging.
It's basically ready for production use except:
_version.py
gets the wrong version on Windows (and only Windows). That kills the workflow at the very end because PyPI won't accept a wheel with an invalid version. Someone familiar with that versioneer code needs to look at it, I'm the wrong person for that.wheel.yml
to point at the real PyPI.