Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] update matrix filters for dask-cudf builds #15174

Merged
merged 9 commits into from
Feb 29, 2024

Conversation

jameslamb
Copy link
Member

@jameslamb jameslamb commented Feb 28, 2024

Description

After merging rapidsai/shared-workflows#176, we observed that CI was failing in this repo despite attempts in rapidsai/shared-workflows#176 to use GitHub Actions' continue-on-error mechanism to prevent new Python 3.11 jobs from blocking merges.

By "failing", I specifically mean:

  • the pr-builder job was stuck indefinitely in pending status
  • all other jobs (except 3 Python 3.11 jobs) appeared successful

I believe at least 1 root cause was that as of the changes in rapidsai/shared-workflows#176, this matrix filter no longer matches any existing wheels-test CI jobs:

matrix_filter: map(select(.ARCH == "amd64" and .PY_VER == "3.10" and (.CUDA_VER == "11.8.0" or .CUDA_VER == "12.2.2")))

image

This tries to fix that by updating the filters in this project's CI to some that matches those defined over in https://github.com/rapidsai/shared-workflows/blob/63aad51bc77d018fc867d1057651fd0afa619707/.github/workflows/wheels-test.yaml#L75

Notes for Reviewers

This PR is just trying to unblock CI, and so chooses a specific known-present matrix combination.

Such issues could be avoided in the future by following @ajschmidt8's suggestion in rapidsai/dask-cuda#1294 (review), to change these filters from "this one specific combination" to "any one of these conditions".

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@jameslamb jameslamb added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 28, 2024
@jameslamb jameslamb changed the title (DO NOT MERGE) test different matrix filter (DO NOT MERGE) update matrix filter for wheel tests Feb 28, 2024
@jameslamb jameslamb changed the title (DO NOT MERGE) update matrix filter for wheel tests update matrix filter for wheel tests Feb 28, 2024
@jameslamb jameslamb marked this pull request as ready for review February 28, 2024 22:14
@jameslamb jameslamb requested a review from a team as a code owner February 28, 2024 22:14
@@ -92,7 +92,7 @@ jobs:
secrets: inherit
uses: rapidsai/shared-workflows/.github/workflows/wheels-build.yaml@branch-24.04
with:
matrix_filter: map(select(.ARCH == "amd64" and .PY_VER == "3.10" and (.CUDA_VER == "11.8.0" or .CUDA_VER == "12.2.2")))
matrix_filter: map(select(.ARCH == "amd64" and .PY_VER == "3.11" and (.CUDA_VER == "11.8.0" or .CUDA_VER == "12.2.2")))
Copy link
Contributor

@bdice bdice Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way for us to use max_by filters rather than hardcode these values?

I'm not a jq expert. I don't know how to write the query CUDA_VER == max(CUDA_11_VERSIONS) or CUDA_VER == max(CUDA_12_VERSIONS) but that's what we need for the second part.

We've used matrix_filter: map(select(.ARCH == "amd64")) | max_by(.CUDA_VER) | [.] elsewhere before when we only care about one CUDA version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @ajschmidt8 if you know how to do this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is, @ajschmidt8 shared some ideas about that at rapidsai/dask-cuda#1294 (comment)

That thread produced this still-open issue: https://github.com/rapidsai/dask-cuda/issues/1296

I didn't pursue those ideas in this PR because I didn't want to add another thing to the mix of what was being tested, in the interest of unblocking CI as soon as possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that issue is in dask-cuda, maybe https://github.com/rapidsai/build-planning would be a better place, to look into this RAPIDS-wide?

But either way, I think we should defer that to another PR.

Copy link
Contributor

@bdice bdice Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOoooooOOOoooo. I studied a bit of jq.

Let's try:

map(select(.ARCH == "amd64")) | group_by(.CUDA_VER|split(".")|map(tonumber)|.[0]) | map(max_by(.PY_VER|split(".")|map(tonumber)))

We don't really care what the exact CUDA/Python version is, we just want one build for each CUDA major version in the matrix.

Suggested change
matrix_filter: map(select(.ARCH == "amd64" and .PY_VER == "3.11" and (.CUDA_VER == "11.8.0" or .CUDA_VER == "12.2.2")))
matrix_filter: map(select(.ARCH == "amd64")) | group_by(.CUDA_VER|split(".")|map(tonumber)|.[0]) | map(max_by(.PY_VER|split(".")|map(tonumber)))

(We can do this in a later PR, too, to avoid blocking this work.)

Copy link
Contributor

@bdice bdice Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I found something even better. This handles the final max_by step more clearly, with tie-breaking over both Python and CUDA versions. I can file a separate PR for this, if this one is nearly ready to merge.

map(select(.ARCH == "amd64")) | group_by(.CUDA_VER|split(".")|map(tonumber)|.[0]) | map(max_by([(.PY_VER|split(".")|map(tonumber)), (.CUDA_VER|split(".")|map(tonumber))]))
  1. Take only amd64 jobs
  2. Group them by CUDA major version (11 in one list, 12 in another)
  3. Map over each CUDA major version, taking the maximum over Python version first and CUDA version second.

Finally, you have a list containing two matrix entries: the highest (Python, CUDA) entry for each of the two CUDA major versions.

Copy link
Member

@jakirkham jakirkham Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that issue is in dask-cuda, maybe https://github.com/rapidsai/build-planning would be a better place, to look into this RAPIDS-wide?

But either way, I think we should defer that to another PR.

Yeah IIRC that issue preceded the build-planning repo. In any event agree this is a good idea (especially if it is a wider-spread issue)

Transferred the Dask-CUDA issue to rapidsai/build-planning#25

@jameslamb jameslamb changed the title update matrix filter for wheel tests update matrix filters for dask-cudf builds Feb 28, 2024
@jameslamb jameslamb changed the title update matrix filters for dask-cudf builds [ci] update matrix filters for dask-cudf builds Feb 28, 2024
@vyasr
Copy link
Contributor

vyasr commented Feb 29, 2024

In the interest of unblocking CI I've approved, but I assume we want to go back and use the more general jq formula above to avoid hardcoding the filtered jobs where possible. That can be done in a follow-up, though.

@ajschmidt8 ajschmidt8 merged commit 8507b3d into rapidsai:branch-24.04 Feb 29, 2024
76 checks passed
@jameslamb jameslamb deleted the test-matrix-filter branch February 29, 2024 00:49
@jakirkham
Copy link
Member

Thanks all! 🙏

rapids-bot bot pushed a commit that referenced this pull request Mar 5, 2024
To eliminate hard-coding, generalize the GHA workflow logic to select one build for testing. This should simplify future updates.

This is a follow-up to #15174.

xref: rapidsai/build-planning#25

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Jake Awe (https://github.com/AyodeAwe)
  - https://github.com/jakirkham

URL: #15191
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants