-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update RMMNumbaManager
to handle NUMBA_CUDA_USE_NVIDIA_BINDING=1
#1004
Update RMMNumbaManager
to handle NUMBA_CUDA_USE_NVIDIA_BINDING=1
#1004
Conversation
RMMNumbaManager
to handle NUMBA_CUDA_USE_NVIDIA_BINDING=1
@brandon-b-miller we are already in burndown for 22.04, so unless this is urgent for 22.04 we should push to the next release. From the bug description this doesn't sound like something we would hotfix for, so I think we can push it. |
Have you run the Numba test suite with this branch? e.g.:
|
This revealed more changes that were needed which are now pushed. |
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing when running
NUMBA_CUDA_USE_NVIDIA_BINDING=1 NUMBA_CUDA_MEMORY_MANAGER=rmm python -m numba.runtests numba.cuda.tests -v -m
with this PR and Numba main
:
======================================================================
FAIL: test_ipc_array (numba.cuda.tests.cudapy.test_ipc.TestIpcStaged)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/gmarkall/numbadev/numba/numba/cuda/tests/cudapy/test_ipc.py", line 293, in test_ipc_array
self.fail(out)
AssertionError: Traceback (most recent call last):
File "/home/gmarkall/numbadev/numba/numba/cuda/tests/cudapy/test_ipc.py", line 215, in staged_ipc_array_test
with cuda.gpus[device_num]:
File "/home/gmarkall/numbadev/numba/numba/cuda/cudadrv/devices.py", line 84, in __exit__
self._device.get_primary_context().pop()
File "/home/gmarkall/numbadev/numba/numba/cuda/cudadrv/driver.py", line 1355, in pop
assert int(popped) == int(self.handle)
AssertionError
======================================================================
FAIL: test_staged (numba.cuda.tests.cudapy.test_ipc.TestIpcStaged)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/gmarkall/numbadev/numba/numba/cuda/tests/cudapy/test_ipc.py", line 273, in test_staged
self.fail(out)
AssertionError: Traceback (most recent call last):
File "/home/gmarkall/numbadev/numba/numba/cuda/tests/cudapy/test_ipc.py", line 18, in core_ipc_handle_test
arr = the_work()
File "/home/gmarkall/numbadev/numba/numba/cuda/tests/cudapy/test_ipc.py", line 199, in the_work
with cuda.gpus[device_num]:
File "/home/gmarkall/numbadev/numba/numba/cuda/cudadrv/devices.py", line 84, in __exit__
self._device.get_primary_context().pop()
File "/home/gmarkall/numbadev/numba/numba/cuda/cudadrv/driver.py", line 1355, in pop
assert int(popped) == int(self.handle)
AssertionError
----------------------------------------------------------------------
Ran 1278 tests in 111.982s
FAILED (failures=2, skipped=20, expected failures=8)
This is with multiple devices:
$ python -c "from numba import cuda; cuda.detect()"
Found 3 CUDA devices
id 0 b'NVIDIA RTX A6000' [SUPPORTED]
Compute Capability: 8.6
PCI Device ID: 0
PCI Bus ID: 21
UUID: GPU-842b25ad-db82-ba9d-0380-e65fe57189eb
Watchdog: Enabled
FP32/FP64 Performance Ratio: 32
id 1 b'NVIDIA RTX A6000' [SUPPORTED]
Compute Capability: 8.6
PCI Device ID: 0
PCI Bus ID: 45
UUID: GPU-af183771-f998-7235-c638-b407c81bf3f7
Watchdog: Enabled
FP32/FP64 Performance Ratio: 32
id 2 b'Quadro P2200' [SUPPORTED]
Compute Capability: 6.1
PCI Device ID: 0
PCI Bus ID: 11
UUID: GPU-321c7ee1-375f-7c11-a413-b0aab3ec4756
Watchdog: Enabled
FP32/FP64 Performance Ratio: 32
Summary:
3/3 devices are supported
(I suspect it does not occur with a single GPU)
Turns out this is something that started happening between 21.12 and 22.02, unrelated to this PR - I'll look and see if there's a fix we can roll into this PR so it passes tests again. |
This will be hard to track down and is unrelated to this PR, so let's not attempt to address it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the fact the issue I previously identified is unrelated to this PR and was introduced earlier, I now think this looks good.
@gpucibot merge |
1 similar comment
@gpucibot merge |
Fixes #1003