-
Notifications
You must be signed in to change notification settings - Fork 538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enforce wheel size limits, README formatting in CI #6136
Conversation
] | ||
|
||
# detect when package size grows significantly | ||
max_allowed_size_compressed = '1.5G' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this value coming from the existing wheel size, or coming from somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems too big. cuML wheels appear to be closer to 550MB. Source: https://anaconda.org/rapidsai-wheels-nightly/cuml-cu12/files
Maybe set the threshold at 600MB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the existing wheel size + a buffer. It varies by CPU architecture, Python version (because of Cython stuff), and CUDA version (because, for example, we don't use CUDA math lib wheels for CUDA 11).
The largest one I've seen clicking through logs on this PR was CUDA 11.8.0, Python 3.10, amd64
:
checking 'final_dist/cuml_cu11-24.12.0a38-cp310-cp310-manylinux_2_28_x86_64.whl'
----- package inspection summary -----
file size
* compressed size: 1.3G
* uncompressed size: 2.2G
* compression space saving: 42.2%
contents
* directories: 72
* files: 432 (86 compiled)
size by extension
* .so - 2.2G (99.9%)
* .py - 1.4M (0.1%)
* .pyx - 1.2M (0.1%)
* .0 - 0.2M (0.0%)
* .ipynb - 0.1M (0.0%)
* no-extension - 57.2K (0.0%)
* .png - 51.3K (0.0%)
* .pxd - 34.0K (0.0%)
* .txt - 25.7K (0.0%)
* .md - 10.3K (0.0%)
* .h - 2.1K (0.0%)
* .ini - 0.8K (0.0%)
largest files
* (2.2G) cuml/libcuml++.so
* (3.0M) cuml/experimental/fil/fil.cpython-310-x86_64-linux-gnu.so
* (2.9M) cuml/fil/fil.cpython-310-x86_64-linux-gnu.so
* (1.5M) cuml/cluster/hdbscan/hdbscan.cpython-310-x86_64-linux-gnu.so
* (1.5M) cuml/svm/linear.cpython-310-x86_64-linux-gnu.so
------------ check results -----------
errors found while checking: 0
So proposing setting this to around 200MB above that size, so we'd be notified if the binary size increased above that level.
There's nothing special about 1.5GB... it's already way way too big to be on PyPI. But proposing putting some limit so that we can get automated feedback from CI about binary size growth, and make informed decisions about whether to do something about it... similar to setting a coverage threshold for tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aaahhh, but CUDA 11 is huge. We only did CUDA wheels work for CUDA 12. https://anaconda.org/rapidsai-wheels-nightly/cuml-cu11/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah exactly:
cuml/python/cuml/CMakeLists.txt
Lines 96 to 104 in 56e5e62
# Link to the CUDA wheels with shared libraries for CUDA 12+ | |
if(CUDAToolkit_VERSION VERSION_GREATER_EQUAL 12.0) | |
set(CUDA_STATIC_MATH_LIBRARIES OFF) | |
else() | |
if(USE_CUDA_MATH_WHEELS) | |
message(FATAL_ERROR "Cannot use CUDA math wheels with CUDA < 12.0") | |
endif() | |
set(CUDA_STATIC_MATH_LIBRARIES ON) | |
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m indifferent as well. Let’s stick to the single definition for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright sounds good, thanks for considering it. I'm glad these changes are helping to expose these differences and leading to these conversations 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be a fan having two different limits. Mostly because for "not CUDA 11" the limit of 1.5GB might as well be "infinity". As in, if we ever reach it, it will be way to late to course correct.
Should I make a PR that uses Jams' suggestion for two limits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@betatim sure! Go for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@betatim I've put up PRs in other repos following this suggestion, if you'd like something to copy from here in cuml
:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accepting the large threshold for now. Maybe we can make it depend on CUDA version? Or just wait until we drop CUDA 11.
/merge |
`cuvs-cu11` wheels are significantly larger than `cuvs-cu12` wheels, because (among other reasons) they are not able to dynamically link to CUDA math library wheels. In #464, I proposed a size limit for CI checks of "max CUDA 11 wheel size + a buffer". This PR proposes using different thresholds based on CUDA major version, following these discussions: * rapidsai/cugraph#4754 (comment) * rapidsai/cuml#6136 (comment) Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Mike Sarahan (https://github.com/msarahan) URL: #469
Description
Contributes to rapidsai/build-planning#110
Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI.