Skip to content

Setup cibuildwheel with Linux, macOS, Windows wheels #66

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

alugowski
Copy link
Collaborator

This is a working cibuildwheel workflow for building python-suitesparse-graphblas wheels for:

  • Windows
  • macOS
    • both x86 and ARM
  • Linux
    • both x86 and arm64
    • both manylinux and musllinux
  • Both CPython and PyPy on all platforms
  • SDist
  • Upload to PyPI

It took many many iterations and much debugging.

It's basically ready for production use except:

  • I've no idea why _version.py gets the wrong version on Windows (and only Windows). That kills the workflow at the very end because PyPI won't accept a wheel with an invalid version. Someone familiar with that versioneer code needs to look at it, I'm the wrong person for that.
  • The PyPI upload step is currently pointed at the Test PyPI site. When happy with the workflow follow the comments at the bottom of wheel.yml to point at the real PyPI.

@alugowski
Copy link
Collaborator Author

You can see a run of this workflow on my fork: https://github.com/alugowski/python-suitesparse-graphblas/actions/runs/4119959220/jobs/7115370559

Example of what _version.py produces. Notice all the versions are correct except the Windows ones:

Uploading distributions to https://test.pypi.org/legacy/
INFO     dist/suitesparse_graphblas-0+untagged.1.g28c3b04-cp310-cp310-win_amd64.
         whl (23.7 MB)                                                          
INFO     dist/suitesparse_graphblas-0+untagged.1.g28c3b04-cp311-cp311-win_amd64.
         whl (23.7 MB)                                                          
INFO     dist/suitesparse_graphblas-0+untagged.1.g28c3b04-cp38-cp38-win_amd64.wh
         l (23.7 MB)                                                            
INFO     dist/suitesparse_graphblas-0+untagged.1.g28c3b04-cp39-cp39-win_amd64.wh
         l (23.7 MB)                                                            
INFO     dist/suitesparse_graphblas-0+untagged.1.g28c3b04-pp38-pypy38_pp73-win_a
         md64.whl (23.6 MB)                                                     
INFO     dist/suitesparse_graphblas-0+untagged.1.g28c3b04-pp39-pypy39_pp73-win_a
         md64.whl (23.6 MB)                                                     
INFO     dist/suitesparse_graphblas-7.4.3.0-cp310-cp310-macosx_10_9_x86_64.whl  
         (27.6 MB)                                                              
INFO     dist/suitesparse_graphblas-7.4.3.0-cp310-cp310-macosx_11_0_arm64.whl   
         (480.0 KB)                                                             
INFO     dist/suitesparse_graphblas-7.4.3.0-cp310-cp310-manylinux_2_17_x86_64.ma
         nylinux2014_x86_64.whl (26.2 MB)                                       
INFO     dist/suitesparse_graphblas-7.4.3.0-cp310-cp310-musllinux_1_1_x86_64.whl
         (26.0 MB)                                                              
INFO     dist/suitesparse_graphblas-7.4.3.0-cp311-cp311-macosx_10_9_x86_64.whl  
         (27.6 MB)                                                              
INFO     dist/suitesparse_graphblas-7.4.3.0-cp311-cp311-macosx_11_0_arm64.whl   
         (479.9 KB)                                                             
INFO     dist/suitesparse_graphblas-7.4.3.0-cp311-cp311-manylinux_2_17_x86_64.ma
         nylinux2014_x86_64.whl (26.2 MB)                                       
INFO     dist/suitesparse_graphblas-7.4.3.0-cp311-cp311-musllinux_1_1_x86_64.whl
         (26.0 MB)                                                              
INFO     dist/suitesparse_graphblas-7.4.3.0-cp38-cp38-macosx_10_9_x86_64.whl    
         (27.6 MB)                                                              
INFO     dist/suitesparse_graphblas-7.4.3.0-cp38-cp38-macosx_11_0_arm64.whl     
         (480.7 KB)                                                             
INFO     dist/suitesparse_graphblas-7.4.3.0-cp38-cp38-manylinux_2_17_x86_64.many
         linux2014_x86_64.whl (26.2 MB)                                         
INFO     dist/suitesparse_graphblas-7.4.3.0-cp38-cp38-musllinux_1_1_x86_64.whl  
         (26.0 MB)                                                              
INFO     dist/suitesparse_graphblas-7.4.3.0-cp39-cp39-macosx_10_9_x86_64.whl    
         (27.6 MB)                                                              
INFO     dist/suitesparse_graphblas-7.4.3.0-cp39-cp39-macosx_11_0_arm64.whl     
         (481.0 KB)                                                             
INFO     dist/suitesparse_graphblas-7.4.3.0-cp39-cp39-manylinux_2_17_x86_64.many
         linux2014_x86_64.whl (26.2 MB)                                         
INFO     dist/suitesparse_graphblas-7.4.3.0-cp39-cp39-musllinux_1_1_x86_64.whl  
         (26.0 MB)                                                              
INFO     dist/suitesparse_graphblas-7.4.3.0-pp38-pypy38_pp73-macosx_10_9_x86_64.
         whl (27.4 MB)                                                          
INFO     dist/suitesparse_graphblas-7.4.3.0-pp38-pypy38_pp73-manylinux_2_17_x86_
         64.manylinux2014_x86_64.whl (24.9 MB)                                  
INFO     dist/suitesparse_graphblas-7.4.3.0-pp39-pypy39_pp73-macosx_10_9_x86_64.
         whl (27.4 MB)                                                          
INFO     dist/suitesparse_graphblas-7.4.3.0-pp39-pypy39_pp73-manylinux_2_17_x86_
         64.manylinux2014_x86_64.whl (24.9 MB)                                  
INFO     dist/suitesparse-graphblas-7.4.3.0.tar.gz (138.5 KB) 

cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

# Disable optimizing some rarely-used types for significantly faster builds and significantly smaller wheel size.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If experimenting with the workflow I recommend disabling all the types. That brings the GraphBLAS build times into the 15 minute range on GitHub Actions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a non-documented feature that you can use for this. Turn on the CMAKE_CUDA_DEV flag in the GraphBLAS CmakeLists.txt. That disables all compilation of the kernels in Source/Generated2.

I don't recommend that for production so I don't document it. The resulting GraphBLAS will be pretty slow. But it's nice for development.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DrTimothyAldenDavis I thought I was going crazy because -DCMAKE_CUDA_DEV=1 didn't seem to make a difference, but I just noticed CMakeLIsts.txt has set ( CMAKE_CUDA_DEV off ) on lines 56 and 61, so CMAKE_CUDA_DEV is only usable if one is able to edit CMakeLists.txt directly. I supoose it's possible to do with sed, but sed inplace support varies between OSes and such edits are a PITA.

Copy link
Collaborator Author

@alugowski alugowski Feb 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DrTimothyAldenDavis Just FYI, compiling that combinatorial explosion of types really does make working with GraphBLAS very difficult. This PR would not happen if GitHub's accounting wasn't broken, as the CI time should have cost me several hundred dollars. With the types disabled as seen here, building just the macOS wheels one time costs about $3.50 according to GitHub's published price list ($5 for all OSes). I'm estimating well over $20/run for all types enabled. I don't mind because GitHub doesn't have my credit card and they aren't counting anything against my free quota, but that's just luck on our part.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's intentional. It is not meant for anyone else to use but me. It is supposed be very hard to turn on GBCUDA_DEV, because turning it on slows down GraphBLAS immensely (run time, not compile time).

If it's easy to turn on GBCUDA_DEV, then I worry that it would be compiled that way for production and distribution, as in a linux distro or conda package. That would be awful.

If you like, you could instead compile with

-DGxB_NO_BOOL=1 -DGxB_NO_INT8=1 -DGxB_NO_INT16=1 ... (etc).

I do not recommend that for a production build.

See GraphBLAS/Source/GB_control.h:
Note that I already disable quite a few semirings via that file.

GraphBLAS requires a surprisingly large amount of semirings and operators when used in production.

I'm working on a JIT where I can compile these at run-time, however. That will keep the performance up (and even improve it) while at the same time reducing the initial compile time. Distros will have to package my source code for that to work, however.

@eriknw
Copy link
Member

eriknw commented Feb 8, 2023

✨ This is amazing! Thank you so much @alugowski. ✨

I will try to take a closer look this weekend (sooner if I'm able), and I'll try to figure out the version issue and whether this now supports complex on Windows.

I would love to learn how you iterated and debugged this. I also think it's interesting you're not using pypa/cibuildwheel github action. At a glance, this looks really, really good.

Is everything compiled with OpenMP enabled?

@alugowski
Copy link
Collaborator Author

I will try to take a closer look this weekend (sooner if I'm able), and I'll try to figure out the version issue and whether this now supports complex on Windows.

Sounds good!

If a solution is elusive then a fallback fix might be to just use ${{ github.ref }}, possibly coupled with an env var.

@alugowski
Copy link
Collaborator Author

Is everything compiled with OpenMP enabled?

Yes. That was an adventure in itself.

Linux (GCC) and Windows (MinGW) come with OpenMP and GraphBLAS discovers it automatically. So that's easy.

macOS is tricky because for reasons I don't know their clang is famously missing OpenMP. Generally the two options are:

  • Install regular llvm and use that.
  • Install libomp which is the llvm OpenMP specially built to target the bundled clang.

I generally take the first approach because it's simpler for C++ projects, but it didn't work here. First there was some library mismatch when assembling the wheel that I don't remember the details of. Second is a comment from the cibuildhweel folks who push you to use the default compiler on each platform. Apart from lots of possible issues that you're on your own for, they point out that you'll face difficulty with some channels, like conda, that will only use the default compiler.

So I had to figure out how to get that libomp method to work. CMake's FindOpenMP docs make it seem like it should just work, but it doesn't on my macbook and it also doesn't in the Actions runner. So I found a few flags that can guide it to find the library, and you have to set a few more flags to make everything work. All that stuff is in the CIBW_ENVIRONMENT_MACOS env var. As you can imagine, every single option on that line took several iterations to get work and verify that it's actually necessary. A big frustration was that GitHub's homebrew uses different directories than the default homebrew (and hence my laptop) so you'd try things that are obviously correct for you at home but mysteriously don't work. Of course they start working when you provide the correct path via $(brew --prefix libomp) :) Luckily GraphBLAS reports OpenMP status at the beginning of the build so each iteration isn't too long.

@eriknw
Copy link
Member

eriknw commented Feb 9, 2023

Excellent.

As you can imagine, every single option on that line took several iterations to get work and verify that it's actually necessary.

Oh, I can imagine all to well. You have no idea how happy this PR makes me :) . I really, really, really appreciate the effort and pain you put into this. How would you like to be recognized for this effort?

To see if Windows now supports complex, we can simply delete these lines (I don't think we need the ARM-specific file anymore either):

if is_win:
header = "suitesparse_graphblas_no_complex.h"
if is_arm64 or is_ppc64le:
header = "suitesparse_graphblas_arm64.h"

We'll know whether complex works if it compiles and passes tests (which I see also get run in the GitHub Action--hooray!).

@alugowski
Copy link
Collaborator Author

I would love to learn how you iterated and debugged this. I also think it's interesting you're not using pypa/cibuildwheel github action. At a glance, this looks really, really good.

What made it easier is that I had a working workflow from another project I was working on recently (a threaded Matrix Market loader. About 20x faster than SciPy's, maybe I'll make a PR for that). So I started with that. Then you get Linux working (since it should already work), then macos, then Windows. Well, once you're in the steps that require building GraphBLAS you push a change to each OS each time as you have to wait 15 minutes to see what silly thing kills it that time. To speed things up add a push trigger to the workflow so you don't have to click in the UI, and disable the GraphBLAS optimizations to speed up that build.

Then just tackle build issues as you go.

The general flow came from the already working examples, "just" need to modify for this project. This is actually why I took this on, since it's fresh in my mind and you guys need it :)

All the options are very well written up in the cibuildwheel docs: https://cibuildwheel.readthedocs.io/en/stable/options/
Basically the steps are BEFORE_ALL, then the architectures/interpreters selected, wheel is built, repaired, tested. Each step is controlled with environment variables. Each has platform specific ones with a *_WINDOWS, *_MACOS, *_LINUX suffix. scikit-learn's workflow showed how to get a GitHub Actions runner for each platform. Luckily the Windows runner has a Linux-like environment so most things just work.

cibuildwheel has a CIBW_BEFORE_ALL option that is intended to build libraries, so suitesparse.sh went there. I had to add debug prints to figure out what it needed. You can see its output on the GitHub workflow page, expand the "Build Wheels" step then the "Running before_all" header.

Tests are very important in this process because they confirm things work, especially with a language like Python that has no static checks. Tests with good coverage can expose many platform-specific issues. This repo's tests took a while to work (test.yml too, for unrelated reasons). A big spanner in the works was that tests/__init__.py file that is deleted in this PR. The issue was that setup.py pytest (that tests.yml does) and pytest suitesparse_graphblas/tests (that cibuildwheel does) invokes the module loader differently, and the latter method would introduce a cycle and croak. Once things worked when the tests were moved elsewhere I realized the init.py was the issue (and isn't necessary).

Similarly the necessity of the entire delvewheel repair step became apparent only when the tests failed to find graphblas.dll on Windows. Turns out the "repair" step identifies shared libraries from the host and copies them into the wheel. cibuildwheel bundles tools to do that for you on Linux and macOS, but not yet on Windows.

As for why not use the cibuildwheel action, my thoughts are thus.

  • First and foremost, the clear working examples that I found (especially the scikit-learn one linked in the comments) didn't use the action. A working example, especially a production one, is far more valuable than some docs that are never complete.
  • Second, all the official detailed cibuildwheel documentation, examples, and other online resources use this setup with the env vars. It seemed like the action would be just an extra complication to figure out and inevitably debug. I just see a struggle to find workarounds for features supported by the base cibuildwheel but missing in the action.
  • Third, the update story seems more clear to me. I can clearly see that each time the workflow is run I get the most up-to-date cibuildwheel from pip. That means any platform fixes, any new Python versions, etc come for free just by cutting a new release. Some of that may be true of the action, but it's not clear to me when they'll cut a new major version and I get abandoned without knowing it. Maybe dependabot will catch it, maybe it won't.
  • Finally, I don't see what value the action brings. I'm sure it makes simple projects simpler, but for a complex one?

@alugowski
Copy link
Collaborator Author

Excellent.

As you can imagine, every single option on that line took several iterations to get work and verify that it's actually necessary.

Oh, I can imagine all to well. You have no idea how happy this PR makes me :) . I really, really, really appreciate the effort and pain you put into this. How would you like to be recognized for this effort?

To see if Windows now supports complex, we can simply delete these lines (I don't think we need the ARM-specific file anymore either):

if is_win:
header = "suitesparse_graphblas_no_complex.h"
if is_arm64 or is_ppc64le:
header = "suitesparse_graphblas_arm64.h"

We'll know whether complex works if it compiles and passes tests (which I see also get run in the GitHub Action--hooray!).

Ah. I can try that.

@eriknw
Copy link
Member

eriknw commented Feb 9, 2023

Quick question: what version of numpy is being used when building the wheel?

When building, we should probably install oldest-supported-numpy instead of numpy directly. Let's add this to [build-system.requires] in pyproject.toml.

@alugowski
Copy link
Collaborator Author

Quick question: what version of numpy is being used when building the wheel?

When building, we should probably install oldest-supported-numpy instead of numpy directly. Let's add this to [build-system.requires] in pyproject.toml.

Looks like it's using 1.24.2:

     Collecting numpy>=1.19
      Downloading numpy-1.24.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)

Yeah pyproject.toml is a good place for that.

@alugowski
Copy link
Collaborator Author

Rebased on updated main. It'll have to be rebased again when #68 is merged, but it'll be easy.

@alugowski
Copy link
Collaborator Author

The new version stuff works, see the results in Test PyPI: https://test.pypi.org/project/suitesparse-graphblas/#files

Try it:

pip install cffi numpy
pip install --index-url https://test.pypi.org/simple/ suitesparse-graphblas

Test PyPI is often missing packages, so install dependencies from regular PyPI

The version in Test PyPI is currently 0.0.1 because that's what got assigned when I was working in a branch. I'm now running a build from my main branch which should result in a proper version number. Build takes a few hours, this is the job: https://github.com/alugowski/python-suitesparse-graphblas/actions/runs/4153374973

@alugowski
Copy link
Collaborator Author

cffi version bump to v1.11 because that's when they started supporting C99 complex types.
https://cffi.readthedocs.io/en/latest/whatsnew.html

pyproject.toml Outdated
@@ -45,7 +45,7 @@ classifiers = [
]
dependencies = [
# These are super-old; can/should we update them?
"cffi>=1.0.0",
"cffi>=1.11",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this. Can we also set the min version for cffi in [build-system.requires] at the top of the file?

@eriknw
Copy link
Member

eriknw commented Feb 12, 2023

I just tested these wheels in python-graphblas/python-graphblas#385

Linux and Windows work! OS X is complaining about OpenMP, which may not be surprising since everything else is installed from conda-forge. I suppose I should try installing everything with pip.

@jim22k, can you take a couple of minutes to try installing and using these OS X wheels on your new Mac? Thanks!

pip install --no-deps --index-url https://test.pypi.org/simple/ suitesparse-graphblas

@alugowski I'll keep reviewing this PR tomorrow, and I'll research OpenMP with wheels on OS X.

@eriknw
Copy link
Member

eriknw commented Feb 12, 2023

Alright, in python-graphblas CI, I was able to get around the duplicate OpenMP issue on macos by setting an environment variable:

KMP_DUPLICATE_LIB_OK=True

I don't know if this is advisable, but CI is now passing!

Copy link
Member

@eriknw eriknw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks incredible. I have a couple more things I want to look at more closely.

Is artifact size going to be a problem with actions/upload-artifact?

name: Build SDist
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add

        with:
          fetch-depth: 0

to make sure it can always find a tag to use as the version.

Comment on lines +21 to +25
- name: Build SDist
run: pipx run build --sdist

- name: Check metadata
run: pipx run twine check dist/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the versions of pip, build, and twine used here? Should we always update them such as with

          python -m pip install --upgrade pip
          python -m pip install build twine

or the pipx equivalent (note: I don't use or really understand pipx).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK you always get the latest.

run: |
pip install twine
twine upload dist/*-manylinux*.whl
- uses: actions/checkout@v3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here:

        with:
          fetch-depth: 0

- uses: actions/upload-artifact@v3
with:
path: wheelhouse/*.whl

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sanity check, maybe add

          if-no-files-found: error

Comment on lines 23 to 28
# suitesparse.sh installs GraphBLAS.h here
include_dirs.append(os.path.join("C:\\", "Program Files (x86)", "include"))
# graphblas.lib and graphblas.dll.a
library_dirs.append(os.path.join("C:\\", "Program Files (x86)", "lib"))
# graphblas.dll
library_dirs.append(os.path.join("C:\\", "Program Files (x86)", "bin"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not set these by default? Can they instead be set by changing environment variables or flag or optional argument? Not every redistribution will install to these locations when building, so I don't want to hard code these.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. CMakeLists installs to ${SUITESPARSE_LIBDIR} (and inc/bin) but I can't find where those values are set. Maybe they aren't and those are the default paths.

I know SuiteSparse has options for setting the install location, do you know the preferred way for GraphBLAS?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know SuiteSparse has options for setting the install location, do you know the preferred way for GraphBLAS?

I'm not sure I understand the question. What's the difference between SuiteSparse and GraphBLAS here?

When running cmake to build SuiteSparse:GraphBLAS, I think you can set the following variables (I'm not an expert on this):

CMAKE_INSTALL_PREFIX
CMAKE_INSTALL_LIBDIR
CMAKE_PREFIX_PATH
CMAKE_LIBRARY_OUTPUT_DIRECTORY
CMAKE_RUNTIME_OUTPUT_DIRECTORY
CMAKE_ARCHIVE_OUTPUT_DIRECTORY

This is what the conda-forge recipe for Windows does:
https://github.com/conda-forge/graphblas-feedstock/blob/main/recipe/bld.bat

When building python-suitesparse-graphblas, I think the usual environment variables for searching for libraries may work?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the SuiteSparse config step has an option to set the install prefix that applies to all the libraries. You don't have to set those 5 variables individually.

The usual variables seem to work fine on Linux and macOS, but the ones you see in build.py are the defaults on Windows. I don't think Windows has system-wide "lib" and "include" like Unix. If you have a preference let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I'm happy to follow your lead here.

Perhaps we can keep what we have and address it later if there are any issues.

We'll do an alpha release first to give us an opportunity to try everything out.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm testing a rebase that installs to "C:\GraphBLAS". Feels cleaner and doesn't make path assumptions.

I think Alpha is a good idea. Would you prefer if this PR didn't override the existing wheel.yml and instead made an independent one? That way you don't have to commit to this one, and you can switch from Test PyPI when you're comfortable.

We can also coordinate about how to delete the Test PyPI. I think it auto deletes after 30 days, but I can delete earlier so you can claim the package ID. You have to make another account and API token, they don't share with regular PyPI. I uploaded mostly as a proof that this workflow works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm testing a rebase that installs to "C:\GraphBLAS". Feels cleaner and doesn't make path assumptions.

Cool, sounds great.

I say override wheel.yml. I'm also happy to let you "own" Test PyPI ;) . Can you try adding me (eriknw) to the project?

I suppose we should probably do what scikit-learn does in their publish_pypi.yml which is to have a workflow variable to control to which repo to upload to enable easier testing. Up to you. I think our normal workflow will be for wheels to automatically build and upload to PyPI whenever we to a tag/release, and we'll probably do alpha releases when anything with the build system changes.

@alugowski
Copy link
Collaborator Author

Overall this looks incredible. I have a couple more things I want to look at more closely.

Is artifact size going to be a problem with actions/upload-artifact?

I think the current sizes are fine, but this is the second time the artifact upload step has failed (with two different errors). So maybe it's worth trying an auto-retry mechanism.

If this is an ongoing issue then we could have each build step upload directly to PyPI. This multi-step approach does have its own benefits though, it keeps you from publishing broken wheels if only one platform has issues.

@eriknw
Copy link
Member

eriknw commented Feb 13, 2023

I think the current sizes are fine

Cool.

When we're about ready to merge, I would like to re-enable the currently disabled dtypes if possible. I think integer dtypes are important. If this causes timeouts for some platforms, maybe we can disable some types on some platforms such as ARM. @alugowski what do you think?

@alugowski
Copy link
Collaborator Author

I think the current sizes are fine

Cool.

When we're about ready to merge, I would like to re-enable the currently disabled dtypes if possible. I think integer dtypes are important. If this causes timeouts for some platforms, maybe we can disable some types on some platforms such as ARM. @alugowski what do you think?

IIRC the macOS wheel job would stall with all types enabled. By "current" I mean the 22MB of the config in this PR. I believe the full types are over 70MB. I don't know what the GitHub limits are.

Up to you. I don't see the value in the types I left disabled. Python does not have unsigned integers, so a Python user is highly unlikely to ever touch those. It's possible to create an unsigned integer via NumPy ctypes, but someone would have to really go out of their way. And if they do then everything will still work. Similarly any integer created in Python will be an int64, so the smaller ones are also unlikely. In my opinion the smaller size makes the library more accessible and easier to work with, and that's more valuable than optimizing those particular types. Hopefully this will become moot with the JIT.

@eriknw
Copy link
Member

eriknw commented Feb 13, 2023

I'm glad to hear you'll be around to help get wheels working with the upcoming JIT ;)

Data has integers of different sizes and signed/unsigned, so data in Python will too. If you're performing an algorithm on a very large matrix, the dtype size can be the difference between whether it fits in memory or not.

In my opinion the smaller size makes the library more accessible and easier to work with, and that's more valuable than optimizing those particular types.

Very possible!

I'm happy to go with your preference here. If a particular type becomes important for a user or benchmark, we can add it then.

@eriknw
Copy link
Member

eriknw commented Feb 13, 2023

One more thing: @alugowski do you want to add yourself as an author in pyproject.toml? You've made major contributions, so any recognition we can give is well-deserved.

@eriknw
Copy link
Member

eriknw commented Feb 13, 2023

For choosing which types to not compile, how about UINT16, UINT32, INT16, INT32, FC32, and maybe FC64. I like having INT8, UINT8, and UINT64.

@jim22k, if you were to exclude some dtypes from compilation to improve build time and reduce wheel size, which ones would you choose? Note that the dtypes will still work, but they will be slower.

fyi, the size of the (compressed) conda packages we build on conda-forge range between 12-21MB, and these use the default settings: https://anaconda.org/conda-forge/graphblas/files

@DrTimothyAldenDavis
Copy link
Member

You can also be more fine-grained and use a revised GB_control.h file, or add a longer set of -D options.

For example, you could disable all of FC32, but keep FC64. Then just enable the plus and times operators on FC64, nothing else (and maybe the ones the extract real / imag parts, and the binary op that constructs a complex z from real x and real y, as z = x + i*y. That would give you most of the operations people might need on FC64.

@alugowski
Copy link
Collaborator Author

I'm running basic tests on the rebase, will hopefully have it pushed later tonight.

There is one more macOS issue. It looks like the ARM wheel is missing both libomp.dylib and _graphblas.dylib (notice the size difference between the arm and x86 wheel files). I think that's because the x86 runner only has the x86 versions of the library. We'll have to figure out how to have it include both. GitHub Actions does not have arm macOS runners yet, so we can't just use one and have it done for us (like on Linux). Homebrew has ARM builds for macOS, and I'm sure there's a simple switch for clang to build both architectures.

@jim22k
Copy link
Member

jim22k commented Mar 8, 2023

@eriknw @alugowski What is the status of this? Does it need more testing or is macOS ARM blocking the ability to move forward?

@eriknw
Copy link
Member

eriknw commented Mar 16, 2023

@alugowski, we--and eager users--would love to get this in. If we skipped MacOS on ARM for now, is everything else good to go? I'd be happy to do an alpha release or two if/when you think it's ready.

Is openmp the problem on macos-arm, or are there issues with not cross-compiling correctly? I don't yet have time to dive deeply on this to help, but maybe @jim22k could if you pointed him in the right direction. Thanks!

@jim22k
Copy link
Member

jim22k commented Mar 30, 2023

Superseded by #73 which incorporates these changes. Thanks @alugowski for helping us push cibuildwheel forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants