-
Notifications
You must be signed in to change notification settings - Fork 150
Build and test with CUDA 13.0.0 #1273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
|
The first runs all failed with some form of "need RAFT packages", so nothing else learned from this yet. build link: https://github.com/rapidsai/cuvs/actions/runs/17145663792/job/48641578548 Will update this once rapidsai/raft#2787 is merged and we have RAFT packages. |
|
/ok to test |
|
Happy to say that wheels are looking happy, so But some benchmarks are failing to compile, like this: |
|
Updated to latest Let's see how that goes. Should get faster feedback now that we've mostly filled up the |
|
/ok to test |
| int clockRate = 0; | ||
| int memoryClockRate = 0; | ||
| err_code = cudaDeviceGetAttribute(&clockRate, cudaDevAttrClockRate, dev); | ||
| err_code = cudaDeviceGetAttribute(&memoryClockRate, cudaDevAttrMemoryClockRate, dev); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Java builds are failing like this:
[INFO] -------------------------------------------------------------
Error: /__w/cuvs/cuvs/java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/common/Util.java:[192,23] cannot find symbol
symbol: method cudaGetDeviceProperties_v2(java.lang.foreign.MemorySegment,int)
location: class com.nvidia.cuvs.internal.common.Util
[INFO] 1 error
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 13.340 s
[INFO] Finished at: 2025-08-22T21:20:16Z
[INFO] ------------------------------------------------------------------------
Error: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.13.0:compile (compile-java-22) on project cuvs-java: Compilation failure
Error: /__w/cuvs/cuvs/java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/common/Util.java:[192,23] cannot find symbol
Error: symbol: method cudaGetDeviceProperties_v2(java.lang.foreign.MemorySegment,int)
Error: location: class com.nvidia.cuvs.internal.common.Util
@mythrocks do the Java bindings need to be updated? If so could you do that and push them to this PR?
I don't have Java set up in my development environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. This is exactly the sort of error that I'm running into in my work on the portable fat-jar PR. Except, I'm building on 12.9.
I think this is an artifact of some strange jextract behaviour.
Thanks for the confirmation; so I'm not imagining it.
I'll dig into this more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a first step I would try to see if cudaGetDeviceProperties_v2(java.lang.foreign.MemorySegment,int) is defined and what signature it has.
If this passes on 12.9 but fails consistently in 13.0, I suspect this is a header issue.
We generate functions bindings for Java using jextract, but this may or may not be the issue. I suspect CUDA headers #define cudaGetDeviceProperties as "something", and that "something" as a different signature in 12 vs 13. Just a guess, let me see if I can validate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Java, as the other non-C langs, can bind to C functions but of course not via macros - that's why we need to call cudaGetDeviceProperties_v2 directly -- again, modulo my guesswork being correct and cudaGetDeviceProperties is indeed a macro)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I see #define cudaGetDeviceProperties cudaGetDeviceProperties_v2 in my cuda_runtime_api.h. Now let me see how it looks like in 13.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this was broken by #1267
Error: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.13.0:compile (compile-java-22) on project cuvs-java: Compilation failure
Error: /__w/cuvs/cuvs/java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/GPUInfoProviderImpl.java:[55,25] cannot find symbol
Error: symbol: method cudaGetDeviceProperties_v2(java.lang.foreign.MemorySegment,int)
Error: location: class com.nvidia.cuvs.internal.GPUInfoProviderImpl.AvailableGpuInitializer
Error:
Error: -> [Help 1]
I'll try to push a fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried something here: 104508b
Would appreciate your input on it @ldematte @mythrocks . If you look and that seems wrong, please feel free to push to my branch here. I'd really like to get this PR in as soon as possible, so we can move on to other projects in RAPIDS that depend on cuvs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok one more commit (8530e22) because I missed that that was a static method (sorry, I haven't written anything for the JVM in a while 😅)
And looks like that work!
I see this as a minor extension of logic you all were already ok with putting in temporarily, so I'm planning to merge this once CI passes. Thanks again for all the help!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I think it was.... we found in this thread that cudaGetDeviceProperties_v2() could not be used unconditionally, and #1267 introduces code that uses it unconditionally:
cuvs/java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/GPUInfoProviderImpl.java
Lines 55 to 56 in e02b7d1
| returnValue = cudaGetDeviceProperties_v2(deviceProp, i); | |
| checkCudaError(returnValue, "cudaGetDeviceProperties_v2"); |
|
/ok to test |
|
/ok to test |
msarahan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approving for code-owners
|
Merging in recent changes, I now see build failures for I strongly suspect that's a result of #1211. @jinsolp could you please take a look? There are CUDA 13 devcontainers on this branch that you might find helpful for investigating. |
mythrocks
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving for Java-owners. @jameslamb: Good work, mate.
divyegala
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a single comment, pre-approving. Thanks!
Yes, absolutely caused by #1211, since it removed the changes I made in #1219 to allow for CUDA 13 compilation. |
|
Thanks for looking and confirming @robertmaynard ! If you know how to fix this, could you push a patch to my branch here? |
|
CI seems to have borked on an intermittent test failure. I'm going to re-kick it. |
|
Thanks very much @mythrocks for re-running CI and to you and everyone else for your help on this!!!! I'm going to merge this, |
|
/merge |
55985fe
into
rapidsai:branch-25.10
Contributes to rapidsai/build-planning#208 * uses CUDA 13.0.0 to build and test (using the same patterns from the `cuvs-java` tests, in rapidsai/cuvs#1273) ## Notes for Reviewers This switches GitHub Actions workflows to the `cuda13.0` branch from here: rapidsai/shared-workflows#413 A future round of PRs will revert that back to `branch-25.10`, once all of RAPIDS supports CUDA 13. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Jake Awe (https://github.com/AyodeAwe) - Ben Frederickson (https://github.com/benfred) - rhdong (https://github.com/rhdong) URL: #20
…based on symbol presence (#1323) In #1273 we addressed a signature change in CUDA 13 by binding different symbols based on a environment variable, `RAPIDS_CUDA_MAJOR`. That works, but forces users of cuvs-java with CUDA 12 to define this environment variable. This PR improves on it by making the symbol lookup dynamic, looking for the CUDA 13 symbol name, and falling back to the CUDA 12 exported name if we fail to locate the first. Authors: - Lorenzo Dematté (https://github.com/ldematte) Approvers: - MithunR (https://github.com/mythrocks) URL: #1323
Contributes to rapidsai/build-planning#208
cuda-python:>=12.9.2(CUDA 12),>=13.0.1(CUDA 13)cupy:>=13.6.0Contributes to rapidsai/build-planning#68
dependencies.yamlmatrices (i.e., the ones that get written topyproject.tomlin source control)Notes for Reviewers
This switches GitHub Actions workflows to the
cuda13.0branch from here: rapidsai/shared-workflows#413A future round of PRs will revert that back to
branch-25.10, once all of RAPIDS supports CUDA 13.What about Go, Java, and Rust?
This PR expands building / testing for those bindings to cover CUDA 13, but more changes probably need to be made to support distributing packages for those.
Proposing deferring that to follow-ups: