-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate to NVKS for amd64 CI runners #6280
Conversation
Is there something still needed for CUDA 12.8 here? Seeing the following error on CI: Traceback (most recent call last):
File "/opt/conda/bin/rapids-dependency-file-generator", line 10, in <module>
sys.exit(main())
^^^^^^
File "/opt/conda/lib/python3.12/site-packages/rapids_dependency_file_generator/_cli.py", line 125, in main
make_dependency_files(
File "/opt/conda/lib/python3.12/site-packages/rapids_dependency_file_generator/_rapids_dependency_file_generator.py", line 474, in make_dependency_files
raise ValueError(f"No matching matrix found in '{include}' for: {matrix_combo}")
ValueError: No matching matrix found in 'cuda_version' for: {'cuda': '12.8', 'arch': 'x86_64'} Maybe we need to resolve the forward merger: #6272 |
@bdice the current failures are because the jobs are picking a build of cudf nightly that doesn't have the fixes that 25.02a358 has |
The most recent cuDF branch build finished a couple minutes ago (https://github.com/rapidsai/cudf/actions/runs/13078947512). I've merged |
There are a couple CI tests failing like this:
|
kmeans3 = KMeans(n_clusters=3, random_state=24).fit(X) | ||
# With different random_state, results might differ | ||
assert not np.allclose(kmeans1.cluster_centers_, kmeans3.cluster_centers_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to remove this part of the test. It is a bit unclear what it is trying to check because the cluster centers might or might not be the same if we use a different seed. This means that all we can assert here is maybe_allclose
, which doesn't seem like a useful assertion.
So 👍 to removing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code changes look good to me.
Workflow changes look reasonable, branch names are consistently changed, but hopefully someone who knows has already looked at this :D
/merge |
CI failed in an optional job:
|
This migrates amd64 CI jobs (PRs and nightlies) to use L4 GPUs from the NVKS cluster.
xref: https://github.com/rapidsai/build-infra/issues/184