-
Notifications
You must be signed in to change notification settings - Fork 160
Download build artifacts from the backport branch for testing in the main
branch
#357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c05b589
to
f6ec8f2
Compare
/ok to test |
f527b33
to
d34a5c2
Compare
/ok to test |
d34a5c2
to
8c360d9
Compare
/ok to test |
ae72bed
to
70651ea
Compare
/ok to test |
1cb91fe
to
3abdab5
Compare
/ok to test |
3abdab5
to
f113f50
Compare
/ok to test |
1 similar comment
/ok to test |
bebabaf
to
f113f50
Compare
/ok to test |
0852f3f
to
915d31d
Compare
/ok to test |
915d31d
to
fb02af2
Compare
/ok to test |
9d9c1cb
to
0df5408
Compare
/ok to test |
Think there is a way to setup a retention policy in repo settings or the workflow. So this might be an option to help manage size Should add without this setting artifacts stick around forever. Removing them in the UI is one at a time. So something to be aware of |
0df5408
to
3f811af
Compare
/ok to test |
Thanks, @jakirkham! None of GHA Cache related comments are relevant anymore since I am moving away from it (#357 (comment)). I'll update the PR title/description once the CI is green. |
3f811af
to
75e37bd
Compare
/ok to test |
main
main
branch
main
branchmain
branch
OK commit 75e37bd is basically a rewrite of the whole PR. The result is indeed a lot cleaner as expected. Since we have the capability of fetching artifacts generated from the backport branch ( The new logic is simply:
@ksimpson-work @vzhurba01 this is ready for review. |
if [[ ${{ matrix.python-version }} == "3.13" ]]; then | ||
# TODO: remove this hack once cuda-python has a cp313 build | ||
if [[ $SKIP_CUDA_BINDINGS_TEST == 1 ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this hack can be removed now that we generate Python 3.13 wheels for cuda.bindings
11.8 and can retrieve them in the CI; we do not need them published on PyPI in order to use them!
Correct me if I'm wrong. I want to understand this well. I see two cases: One where you are backporting something to 11.8.x where you would want to test cuda core against the active 11.8.x CI build, in which case you would want to target the latest 11.8.x if it was successfully built, or bail out if there were build errors. Second case is making a change in main, specifically to cuda.core, in which case you would want to test against, not the latest successful CI on 11.8.x, but the top of the 11.8.x tree. This is because if someone was simultaneously testing an 11.8.x change, you might test against an 11.8.x version that is different from what a user would be installing. From my understanding of this change, there's a race condition between (11.8.x CI workflows + merges) and main workflows. WDYT? |
Race condition is a legit concern but it is still better than the status quo (no integration test against the head of the backport branch). Moreover, we will set up a nightly CI to reduce the risk (#294) and we already have pre-release QA as the final defense line, so I think it is not very risky and can be improved once our DevOp team take over and iterate toward a more robust implementation. In the first case, if a backport is relevant for cuda.core to work, cuda.core tests would fail unless the backport is merged and rebuilt. So we will know what's going on without a silent green light. The second case is where the race condition could happen IIUC ("which 11.8 build am I testing against?"). |
Ok, I understand that it is a catch 22, and agree that testing against the latest success is far more robust than not testing at all. I just wanted to verbalize that to ensure I correctly understood + make sure we understood that there is a possible improvement there for the DevOps team to address in the future. LGTM |
Yes all great questions here! You made me think twice (and long enough to seek for an alternative solution). This is why we need code review 😄 Thanks, Keenan! |
Close #329.
Update: Please see #357 (comment).
Refresher: There are two kinds of caches that we can use in GHA, Cache and Artifacts. We've been using Artifacts to store build artifacts, which works fine so far but the main issue is the artifacts are scoped on a per-PR basis, meaning they cannot be reused across CI workflow runs triggered by different PRs.This PR adds the capability of uploading artifacts to the Cache space when a PR is merged into themain
branch, so that they can serve as a fallback when a workflow needs certain artifacts for whatever reason. Note that while the Cache space is limited to 10 GB per repo, for our purpose (we have small wheels) it is still OK as a stop-gap solution, until our DevOp team finds a more sustainable one.I also cleaned up the
shell
choice a bit so that all job steps use the same setting.