-
Notifications
You must be signed in to change notification settings - Fork 395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use nvmlDeviceGetCount_v2() first for CUDA check #9170
Conversation
CI failure looks unrelated...
|
src/hmem_cuda.c
Outdated
break; | ||
/* Verify NVIDIA devices are present on the host. */ | ||
nvml_ret = ofi_nvmlDeviceGetCount_v2(&nvml_device_count); | ||
if (NVML_SUCCESS == nvml_ret) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just return an error here, rather than indenting the entire function within the if statement. Also, use forward logic for comparisons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will update
src/hmem_cuda.c
Outdated
case cudaErrorNoDevice: | ||
return -FI_ENOSYS; | ||
if (nvml_device_count > 0) { | ||
cudaError_t cuda_ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declare variables at the top of the function. It makes them easier to find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will update
Please look at the AWS CI failure |
bot:aws:retest |
I'll close this PR and address all the formatting and commit-related issues by submitting another PR with a single commit covering the combined set of changes. |
Just squash locally and force push it. No need for a new PR |
Please don't close the PR. That loses the comments. Just update the original patch and force push. |
Checking w/lightweight nvmlDeviceGetCount_v2() call first allows us to avoid the more expensive call to cudaGetDeviceCount() when there's no NVIDIA devices on the node. Signed-off-by: Quincey Koziol <qkoziol@amazon.com>
Squashed, updated the commit message, and force-pushed |
We are evaluating the performance of the updated version before merging it. |
Performance is still good, merging now. |
@shefty - Should this be backported to any release branches? |
I'll let AWS decide that. IMO, it doesn't seem critical enough to backport. I doubt many apps would notice this outside of some benchmarks. |
Check for CUDA devices with nvmlDeviceGetCount_v2() first, to avoid more expensive call to cudaGetDeviceCount() when possible.