-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ci] update matrix filters for dask-cudf builds #15174
[ci] update matrix filters for dask-cudf builds #15174
Conversation
@@ -92,7 +92,7 @@ jobs: | |||
secrets: inherit | |||
uses: rapidsai/shared-workflows/.github/workflows/wheels-build.yaml@branch-24.04 | |||
with: | |||
matrix_filter: map(select(.ARCH == "amd64" and .PY_VER == "3.10" and (.CUDA_VER == "11.8.0" or .CUDA_VER == "12.2.2"))) | |||
matrix_filter: map(select(.ARCH == "amd64" and .PY_VER == "3.11" and (.CUDA_VER == "11.8.0" or .CUDA_VER == "12.2.2"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way for us to use max_by
filters rather than hardcode these values?
I'm not a jq expert. I don't know how to write the query CUDA_VER == max(CUDA_11_VERSIONS) or CUDA_VER == max(CUDA_12_VERSIONS)
but that's what we need for the second part.
We've used matrix_filter: map(select(.ARCH == "amd64")) | max_by(.CUDA_VER) | [.]
elsewhere before when we only care about one CUDA version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @ajschmidt8 if you know how to do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is, @ajschmidt8 shared some ideas about that at rapidsai/dask-cuda#1294 (comment)
That thread produced this still-open issue: https://github.com/rapidsai/dask-cuda/issues/1296
I didn't pursue those ideas in this PR because I didn't want to add another thing to the mix of what was being tested, in the interest of unblocking CI as soon as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized that issue is in dask-cuda
, maybe https://github.com/rapidsai/build-planning would be a better place, to look into this RAPIDS-wide?
But either way, I think we should defer that to another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOoooooOOOoooo. I studied a bit of jq
.
Let's try:
map(select(.ARCH == "amd64")) | group_by(.CUDA_VER|split(".")|map(tonumber)|.[0]) | map(max_by(.PY_VER|split(".")|map(tonumber)))
We don't really care what the exact CUDA/Python version is, we just want one build for each CUDA major version in the matrix.
matrix_filter: map(select(.ARCH == "amd64" and .PY_VER == "3.11" and (.CUDA_VER == "11.8.0" or .CUDA_VER == "12.2.2"))) | |
matrix_filter: map(select(.ARCH == "amd64")) | group_by(.CUDA_VER|split(".")|map(tonumber)|.[0]) | map(max_by(.PY_VER|split(".")|map(tonumber))) |
(We can do this in a later PR, too, to avoid blocking this work.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I found something even better. This handles the final max_by
step more clearly, with tie-breaking over both Python and CUDA versions. I can file a separate PR for this, if this one is nearly ready to merge.
map(select(.ARCH == "amd64")) | group_by(.CUDA_VER|split(".")|map(tonumber)|.[0]) | map(max_by([(.PY_VER|split(".")|map(tonumber)), (.CUDA_VER|split(".")|map(tonumber))]))
- Take only amd64 jobs
- Group them by CUDA major version (11 in one list, 12 in another)
- Map over each CUDA major version, taking the maximum over Python version first and CUDA version second.
Finally, you have a list containing two matrix entries: the highest (Python, CUDA) entry for each of the two CUDA major versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized that issue is in
dask-cuda
, maybe https://github.com/rapidsai/build-planning would be a better place, to look into this RAPIDS-wide?But either way, I think we should defer that to another PR.
Yeah IIRC that issue preceded the build-planning
repo. In any event agree this is a good idea (especially if it is a wider-spread issue)
Transferred the Dask-CUDA issue to rapidsai/build-planning#25
In the interest of unblocking CI I've approved, but I assume we want to go back and use the more general |
Thanks all! 🙏 |
To eliminate hard-coding, generalize the GHA workflow logic to select one build for testing. This should simplify future updates. This is a follow-up to #15174. xref: rapidsai/build-planning#25 Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Jake Awe (https://github.com/AyodeAwe) - https://github.com/jakirkham URL: #15191
Description
After merging rapidsai/shared-workflows#176, we observed that CI was failing in this repo despite attempts in rapidsai/shared-workflows#176 to use GitHub Actions'
continue-on-error
mechanism to prevent new Python 3.11 jobs from blocking merges.By "failing", I specifically mean:
pr-builder
job was stuck indefinitely in pending statusI believe at least 1 root cause was that as of the changes in rapidsai/shared-workflows#176, this matrix filter no longer matches any existing
wheels-test
CI jobs:cudf/.github/workflows/test.yaml
Line 102 in 8526e6d
This tries to fix that by updating the filters in this project's CI to some that matches those defined over in https://github.com/rapidsai/shared-workflows/blob/63aad51bc77d018fc867d1057651fd0afa619707/.github/workflows/wheels-test.yaml#L75
Notes for Reviewers
This PR is just trying to unblock CI, and so chooses a specific known-present matrix combination.
Such issues could be avoided in the future by following @ajschmidt8's suggestion in rapidsai/dask-cuda#1294 (review), to change these filters from "this one specific combination" to "any one of these conditions".
Checklist