Skip to content

Switch to use CUDA driver APIs in Device constructor #460

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

leofang
Copy link
Member

@leofang leofang commented Feb 21, 2025

Blocked by #459 & #439 (comment).

Before this PR:

In [7]: %timeit Device()
622 ns ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

With this PR:

In [20]: %timeit Device()
391 ns ± 1.86 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

(Bindings are built from the main branch.)

Copy link
Contributor

copy-pr-bot bot commented Feb 21, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@leofang leofang self-assigned this Feb 22, 2025
@leofang leofang added the blocked This task is currently blocked by other tasks label Feb 22, 2025
@leofang leofang added enhancement Any code-related improvements P1 Medium priority - Should do cuda.core Everything related to the cuda.core module and removed blocked This task is currently blocked by other tasks labels Apr 5, 2025
@leofang leofang added this to the cuda.core beta 4 milestone Apr 5, 2025
@leofang leofang changed the title WIP: Switch to use CUDA driver APIs in Device constructor Switch to use CUDA driver APIs in Device constructor Apr 6, 2025
@leofang
Copy link
Member Author

leofang commented Apr 6, 2025

/ok to test

Copy link

github-actions bot commented Apr 6, 2025

@leofang leofang requested review from rwgk and ksimpson-work April 7, 2025 17:39
@leofang leofang marked this pull request as ready for review April 7, 2025 17:39
if err == 0:
device_id = int(dev)
else:
ctx = handle_return(driver.cuCtxGetCurrent())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's going on here? Is there some requirement from cudart which requires CtxGetCurrent() to be called before the device can be queried?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be helpful to raise a more specific error here?

It might be helpful to add a comment right after the else:

            else:
                # Emulate cudart behavior
                err, ctx = driver.cuCtxGetCurrent()
                if err != 0:
                    raise <Informative Error, what we really want is the current device (not primarily current context)>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This handles the case that no context is set to current. The logic is as follows:

  • if a context is set to current, we can easily get the device ID associated with the current context
  • if no context is set to current (which can happen right after cuInit(0) and before anything else is called), we confirm it is the case by checking ctx pointer is zero (err will always succeed), and then pick device 0

ctx = handle_return(driver.cuCtxGetCurrent())
assert int(ctx) == 0
device_id = 0 # cudart behavior
assert isinstance(device_id, int), f"{device_id=}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentionally not using the

             assert_type(device_id, int)

helper?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR predates the helper, I'll add it

if err == 0:
device_id = int(dev)
else:
ctx = handle_return(driver.cuCtxGetCurrent())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be helpful to raise a more specific error here?

It might be helpful to add a comment right after the else:

            else:
                # Emulate cudart behavior
                err, ctx = driver.cuCtxGetCurrent()
                if err != 0:
                    raise <Informative Error, what we really want is the current device (not primarily current context)>

@leofang leofang marked this pull request as draft April 7, 2025 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P1 Medium priority - Should do
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants