Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use nvmlDeviceGetCount_v2() first for CUDA check #9170

Merged
merged 1 commit into from
Jul 27, 2023
Merged

Use nvmlDeviceGetCount_v2() first for CUDA check #9170

merged 1 commit into from
Jul 27, 2023

Conversation

qkoziol
Copy link
Contributor

@qkoziol qkoziol commented Jul 25, 2023

Check for CUDA devices with nvmlDeviceGetCount_v2() first, to avoid more expensive call to cudaGetDeviceCount() when possible.

@wenduwan wenduwan requested review from a team and shefty July 25, 2023 14:53
@wenduwan
Copy link
Contributor

CI failure looks unrelated...

C:\projects\libfabric\include\ofi_atom.h(420,1): warning C4047: 'function': 'long' differs in levels of indirection from 'int32_t *' (compiling source file prov\hook\src\hook.c) [C:\projects\libfabric\libfabric.vcxproj]
C:\projects\libfabric\include\ofi_atom.h(420,1): warning C4024: '_InterlockedCompareExchange': different types for formal and actual parameter 3 (compiling source file prov\hook\src\hook.c) [C:\projects\libfabric\libfabric.vcxproj]
C:\projects\libfabric\include\ofi_atom.h(420,1): warning C4047: 'function': 'long' differs in levels of indirection from 'int32_t *' (compiling source file prov\hook\perf\src\hook_perf.c) [C:\projects\libfabric\libfabric.vcxproj]
C:\projects\libfabric\include\ofi_atom.h(421,1): warning C4133: 'function': incompatible types - from 'volatile ofi_atomic_int_64_t *' to 'volatile long *' (compiling source file prov\hook\src\hook.c) [C:\projects\libfabric\libfabric.vcxproj]
  Category: Warning
  Code: C4244
  File: C:\projects\libfabric\include\ofi_signal.h
  Line: 107
  Column: 64
  Project name: libfabric
  Project file name: C:\projects\libfabric\libfabric.vcxproj

src/hmem_cuda.c Outdated
break;
/* Verify NVIDIA devices are present on the host. */
nvml_ret = ofi_nvmlDeviceGetCount_v2(&nvml_device_count);
if (NVML_SUCCESS == nvml_ret) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just return an error here, rather than indenting the entire function within the if statement. Also, use forward logic for comparisons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will update

src/hmem_cuda.c Outdated
case cudaErrorNoDevice:
return -FI_ENOSYS;
if (nvml_device_count > 0) {
cudaError_t cuda_ret;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declare variables at the top of the function. It makes them easier to find.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will update

configure.ac Show resolved Hide resolved
@qkoziol qkoziol requested review from shefty and shijin-aws July 25, 2023 18:46
@shijin-aws
Copy link
Contributor

Please look at the AWS CI failure

@qkoziol
Copy link
Contributor Author

qkoziol commented Jul 26, 2023

bot:aws:retest

@qkoziol
Copy link
Contributor Author

qkoziol commented Jul 27, 2023

I'll close this PR and address all the formatting and commit-related issues by submitting another PR with a single commit covering the combined set of changes.

@shijin-aws
Copy link
Contributor

I'll close this PR and address all the formatting and commit-related issues by submitting another PR with a single commit covering the combined set of changes.

Just squash locally and force push it. No need for a new PR

@shefty
Copy link
Member

shefty commented Jul 27, 2023

Please don't close the PR. That loses the comments. Just update the original patch and force push.

Checking w/lightweight nvmlDeviceGetCount_v2() call first allows us to avoid
the more expensive call to cudaGetDeviceCount() when there's no NVIDIA devices
on the node.

Signed-off-by: Quincey Koziol <qkoziol@amazon.com>
@qkoziol
Copy link
Contributor Author

qkoziol commented Jul 27, 2023

Squashed, updated the commit message, and force-pushed

@qkoziol qkoziol requested a review from shefty July 27, 2023 15:44
@shijin-aws
Copy link
Contributor

shijin-aws commented Jul 27, 2023

We are evaluating the performance of the updated version before merging it.

@qkoziol
Copy link
Contributor Author

qkoziol commented Jul 27, 2023

Performance is still good, merging now.

@qkoziol qkoziol merged commit 086b741 into ofiwg:main Jul 27, 2023
@qkoziol
Copy link
Contributor Author

qkoziol commented Jul 27, 2023

@shefty - Should this be backported to any release branches?

@shefty
Copy link
Member

shefty commented Jul 27, 2023

I'll let AWS decide that. IMO, it doesn't seem critical enough to backport. I doubt many apps would notice this outside of some benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants