[BUG] FailureCallbackResourceAdaptor pytests fail on GPUs with more than 100GiB of device memory #1733

harrism · 2024-11-19T22:26:32Z

Describe the bug
The following two tests expect exceptions when allocating more than 100GB of memory (1e11 bytes).

Steps/Code to reproduce bug
Run RMM pytests on a >100GiB GPU, e.g. GH200.

Expected behavior
Tests should pass.

The text was updated successfully, but these errors were encountered:

Fixes #1733 by querying total device memory and using twice as much in tests that are expected to fail allocation. Authors: - Mark Harris (https://github.com/harrism) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1734

harrism added bug Something isn't working Python Related to RMM Python API tests Related to unit tests labels Nov 19, 2024

harrism self-assigned this Nov 19, 2024

harrism added this to RMM Project Board Nov 19, 2024

github-project-automation bot moved this to Todo in RMM Project Board Nov 19, 2024

harrism mentioned this issue Nov 19, 2024

Query total memory in failure_callback_resource_adaptor tests #1734

Merged

3 tasks

rapids-bot bot closed this as completed in #1734 Nov 20, 2024

github-project-automation bot moved this from Todo to Done in RMM Project Board Nov 20, 2024

Provide feedback