-
Notifications
You must be signed in to change notification settings - Fork 150
[Java][C] Expose GPUInfo #1267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Java][C] Expose GPUInfo #1267
Conversation
…ted with a resource
…ion in GPUInfoProviderImpl.
|
@mythrocks let me know if it's OK to keep changes to C and Java together, or if you want me to raise 2 separate PRs |
java/cuvs-java/src/main/java/com/nvidia/cuvs/GPUInfoProvider.java
Outdated
Show resolved
Hide resolved
java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/common/Util.java
Outdated
Show resolved
Hide resolved
mythrocks
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of nitpicks. But this is a good change, otherwise.
java/cuvs-java/src/main/java/com/nvidia/cuvs/SynchronizedCuVSResources.java
Show resolved
Hide resolved
|
/ok to test d6ac665 |
|
/ok to test ad2dc03 |
|
Sorry for the late suggestion of the following, to address @cjnolet's concerns regarding frequent calls to
@ldematte: Would you be averse to changing // Lazy initialization for list of available GPUs.
private static class AvailableGpuInitializer {
// Available GPUs are initialized only once when first accessed.
// This is assumed to be invariant for the lifetime of the program.
static final List<GPUInfo> AVAILABLE_GPUS = availableGPUs();
private static List<GPUInfo> availableGPUs() {
try (var localArena = Arena.ofConfined()) {
MemorySegment numGpus = localArena.allocate(C_INT);
int returnValue = cudaGetDeviceCount(numGpus);
checkCudaError(returnValue, "cudaGetDeviceCount");
int numGpuCount = numGpus.get(C_INT, 0);
List<GPUInfo> gpuInfoArr = new ArrayList<GPUInfo>();
// Fill up with GPUInfos.
// ...
return gpuInfoArr;
}
}
}
/**
* Gets all the available GPUs
*
* @return a list of {@link GPUInfo} objects with GPU details
*/
private static List<GPUInfo> availableGPUs() {
return AvailableGpuInitializer.AVAILABLE_GPUS;
} |
|
Note: The caching makes the assumption that the application only has access to the GPUs that were available at application start. I can think of cases where GPUs are made available at runtime. For instance, a GPU could be attached to the box via PCIe-over-Thunderbolt or something. (My home dev setup is this way.) @cjnolet, @benfred: Permission to treat that sort of thing as unlikely/unsupported? |
Ideally the caching would not be done by default at application start, but would be done lazily on the first call to get a property.
Very much unlikely / unsupported. This is not something we consider in RAPIDS at all, and not something we need to consider downstream. |
Agreed. I'm wary of doing this in a static block, for fear of races between CUDA context init and the application's first CUDA call. The suggestion above will initialize lazily. |
A small note on device properties and cachingRaft does provide a helper function to get the device properties: raft::resource::get_device_properties(const resources&). Device properties struct is cached within I see you query the device properties in a context where |
Sounds like a good idea to me!
Agreed.
We can make this deterministic by carefully laying these out, but it would be fragile (e.g. moving a class or sorting fields in a class would influence the result). Not sure if I want to go down that route, even if it would be express better that these are immutable. But better be lazy.
|
|
I think you'd need to initialize the resources object for each GPU at some point anyway and so you could in theory create a list of resources in advance and iterate over them (and only ever access the GPUs via the corresponding resources objects). Some cuVS algorithms use these helper functions occasionally, so you'd spare some latency if you always use the same approach (to not get the same struct cached in two different places). |
That's an interesting idea and I think it's worth keeping this in mind. I have a small change to the C API that exposes raft::resource::get_device_properties, but if you are OK with that I'm going to stash it for now and keep it for a follow-up, when it's time to tackle the multi-GPU support. |
|
/ok to test 10d38a1 |
|
/merge |
cuvs-java already contains a public
GPUInfoclass, but methods to retrieve the information and fill it are internal.This PR exposes them through and interface,
GPUInfoProvider. It also separates immutable data related to a GPU (which is kept inGPUInfo) from transient resources-related data and counters (at the moment, only the amount of free memory, which is kept in the newCuVSResourcesInfo).The change let you query transient data at a later moment; to do this, we need to find the device ID associated with a
CuVSResourceobject. The change to the C API exposes the raft function that does it.