Skip to content

CI: Run some tests with compute-sanitizer #566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
843130f
CI: Add compute-sanitizer paths to linux test environment
carterbox Apr 22, 2025
c8df0dc
CI: Run pytest in the context of compute-sanitizer
carterbox Apr 22, 2025
4ac79d0
CI: Only use compute-sanitizer for tests of one python version
carterbox Apr 22, 2025
be912c9
CI: Add non-zero exitcode to compute-sanitizer
carterbox Apr 22, 2025
865eeb5
CI: Only run compute-sanitzer when testing against local ctk
carterbox Apr 22, 2025
ddd0714
CI: Delay CUDA_HOME variable expansion
carterbox Apr 23, 2025
5079605
CI: Move compute-sanitzer setup into own step after CTK setup
carterbox Apr 23, 2025
4a03465
CI: Optionally skip tests that raise CUDA API errors
carterbox Apr 24, 2025
bd88039
CI: Add sanitizer skip environment variable to CI
carterbox Apr 24, 2025
0430930
DOC: Fix spelling of CI step name
carterbox Apr 24, 2025
95c3914
CI: Skip state failure test when running sanitizer
carterbox Apr 25, 2025
b091340
CI: Skip linker error log test when sanitizer is running
carterbox Apr 25, 2025
10d4c7a
CI: Add note explaining test skip
carterbox Apr 25, 2025
bdfb547
DOC: Document CUDA_PYTHON_SANITIZER_RUNNING
carterbox Apr 25, 2025
9c25910
CI: Skip compute-sanitizer on CTK11
carterbox Apr 25, 2025
7fb013e
BUG: Correctly spell "version"
carterbox Apr 25, 2025
2632bbc
DOC: Fix spelling of sanitizer
carterbox Apr 28, 2025
2bfd402
TST: Define new test skip in conftest instead of copy-paste
carterbox Apr 28, 2025
3f7a79b
Merge remote-tracking branch 'upstream/main' into dching/add-compute-…
carterbox Apr 28, 2025
02559a3
TST: Cleanup post-merge
carterbox Apr 28, 2025
675c41f
CI: Remove COMPUTE_SANITIZER_VERSION variable
carterbox Apr 28, 2025
5abd6e3
TST: Skip another post-merge error
carterbox Apr 28, 2025
53a01cb
TST: Use consistent name for environment variable and test skip function
carterbox Apr 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 24 additions & 6 deletions .github/workflows/test-wheel-linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,24 @@ jobs:
host-platform: ${{ inputs.host-platform }}
cuda-version: ${{ inputs.cuda-version }}

- name: Set up compute-sanitizer
run: |
# We don't test compute-sanitizer on CTK<12 because backporting fixes is too much effort
# We only test compute-sanitizer on python 3.12 arbitrarily; we don't need to use sanitizer on the entire matrix
# Only local ctk installs have compute-sanitizer; there is not wheel for it
if [[ "${{ inputs.python-version }}" == "3.12" && "${{ inputs.cuda-version }}" != "11.8.0" && "${{ inputs.local-ctk }}" == 1 ]]; then
COMPUTE_SANITIZER="${CUDA_HOME}/bin/compute-sanitizer"
COMPUTE_SANITIZER_VERSION=$(${COMPUTE_SANITIZER} --version | grep -Eo "[0-9]{4}\.[0-9]\.[0-9]" | sed -e 's/\.//g')
SANITIZER_CMD="${COMPUTE_SANITIZER} --target-processes=all --launch-timeout=0 --tool=memcheck --error-exitcode=1"
if [[ "$COMPUTE_SANITIZER_VERSION" -ge 202111 ]]; then
SANITIZER_CMD="${SANITIZER_CMD} --padding=32"
fi
echo "CUDA_PYTHON_TESTING_WITH_COMPUTE_SANITIZER=1" >> $GITHUB_ENV
else
SANITIZER_CMD=""
fi
echo "SANITIZER_CMD=${SANITIZER_CMD}" >> $GITHUB_ENV

- name: Run cuda.bindings tests
if: ${{ env.SKIP_CUDA_BINDINGS_TEST == '0' }}
run: |
Expand All @@ -194,18 +212,18 @@ jobs:

pushd ./cuda_bindings
pip install -r requirements.txt
pytest -rxXs -v tests/
${SANITIZER_CMD} pytest -rxXs -v tests/

# It is a bit convoluted to run the Cython tests against CTK wheels,
# so let's just skip them.
if [[ "${{ inputs.local-ctk }}" == 1 ]]; then
if [[ "${{ inputs.host-platform }}" == linux* ]]; then
bash tests/cython/build_tests.sh
elif [[ "${{ inputs.host-platform }}" == win* ]]; then
# TODO: enable this once win-64 runners are up
# TODO: enable this once win-64 runners are up
exit 1
fi
pytest -rxXs -v tests/cython
fi
${SANITIZER_CMD} pytest -rxXs -v tests/cython
fi
popd

Expand All @@ -229,7 +247,7 @@ jobs:

pushd ./cuda_core
pip install -r "tests/requirements-cu${TEST_CUDA_MAJOR}.txt"
pytest -rxXs -v tests/
${SANITIZER_CMD} pytest -rxXs -v tests/

# It is a bit convoluted to run the Cython tests against CTK wheels,
# so let's just skip them. Also, currently our CI always installs the
Expand All @@ -243,7 +261,7 @@ jobs:
# TODO: enable this once win-64 runners are up
exit 1
fi
pytest -rxXs -v tests/cython
${SANITIZER_CMD} pytest -rxXs -v tests/cython
fi
popd

Expand Down
5 changes: 5 additions & 0 deletions cuda_bindings/docs/source/environment_variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,8 @@
## Runtime Environment Variables

- `CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM` : When set to 1, the default stream is the per-thread default stream. When set to 0, the default stream is the legacy default stream. This defaults to 0, for the legacy default stream. See [Stream Synchronization Behavior](https://docs.nvidia.com/cuda/cuda-runtime-api/stream-sync-behavior.html) for an explanation of the legacy and per-thread default streams.


## Test-Time Environment Variables

- `CUDA_PYTHON_TESTING_WITH_COMPUTE_SANITIZER` : When set to 1, tests are skipped that would cause [compute-sanitizer](https://docs.nvidia.com/compute-sanitizer/ComputeSanitizer/index.html) to raise an error.
8 changes: 8 additions & 0 deletions cuda_bindings/tests/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
import os

import pytest

skipif_testing_with_compute_sanitizer = pytest.mark.skipif(
os.environ.get("CUDA_PYTHON_TESTING_WITH_COMPUTE_SANITIZER", "0") == "1",
reason="The compute-sanitizer is running, and this test causes an API error.",
)
9 changes: 9 additions & 0 deletions cuda_bindings/tests/test_cuda.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@

import numpy as np
import pytest
from conftest import skipif_testing_with_compute_sanitizer

import cuda.cuda as cuda
import cuda.cudart as cudart
Expand Down Expand Up @@ -83,6 +84,7 @@ def test_cuda_memcpy():
assert err == cuda.CUresult.CUDA_SUCCESS


@skipif_testing_with_compute_sanitizer
def test_cuda_array():
(err,) = cuda.cuInit(0)
assert err == cuda.CUresult.CUDA_SUCCESS
Expand Down Expand Up @@ -236,6 +238,7 @@ def test_cuda_uuid_list_access():
assert err == cuda.CUresult.CUDA_SUCCESS


@skipif_testing_with_compute_sanitizer
def test_cuda_cuModuleLoadDataEx():
(err,) = cuda.cuInit(0)
assert err == cuda.CUresult.CUDA_SUCCESS
Expand All @@ -251,6 +254,7 @@ def test_cuda_cuModuleLoadDataEx():
cuda.CUjit_option.CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES,
cuda.CUjit_option.CU_JIT_LOG_VERBOSE,
]
# FIXME: This function call raises CUDA_ERROR_INVALID_VALUE
err, mod = cuda.cuModuleLoadDataEx(0, 0, option_keys, [])

(err,) = cuda.cuCtxDestroy(ctx)
Expand Down Expand Up @@ -622,6 +626,7 @@ def test_cuda_coredump_attr():
assert err == cuda.CUresult.CUDA_SUCCESS


@skipif_testing_with_compute_sanitizer
def test_get_error_name_and_string():
(err,) = cuda.cuInit(0)
assert err == cuda.CUresult.CUDA_SUCCESS
Expand Down Expand Up @@ -951,6 +956,7 @@ def test_CUmemDecompressParams_st():
assert int(desc.dstActBytes) == 0


@skipif_testing_with_compute_sanitizer
def test_all_CUresult_codes():
max_code = int(max(cuda.CUresult))
# Smoke test. CUDA_ERROR_UNKNOWN = 999, but intentionally using literal value.
Expand Down Expand Up @@ -983,18 +989,21 @@ def test_all_CUresult_codes():
assert num_good >= 76 # CTK 11.0.3_450.51.06


@skipif_testing_with_compute_sanitizer
def test_cuKernelGetName_failure():
err, name = cuda.cuKernelGetName(0)
assert err == cuda.CUresult.CUDA_ERROR_INVALID_VALUE
assert name is None


@skipif_testing_with_compute_sanitizer
def test_cuFuncGetName_failure():
err, name = cuda.cuFuncGetName(0)
assert err == cuda.CUresult.CUDA_ERROR_INVALID_VALUE
assert name is None


@skipif_testing_with_compute_sanitizer
@pytest.mark.skipif(
driverVersionLessThan(12080) or not supportsCudaAPI("cuCheckpointProcessGetState"),
reason="When API was introduced",
Expand Down
2 changes: 2 additions & 0 deletions cuda_bindings/tests/test_cudart.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@

import numpy as np
import pytest
from conftest import skipif_testing_with_compute_sanitizer

import cuda.cuda as cuda
import cuda.cudart as cudart
Expand Down Expand Up @@ -70,6 +71,7 @@ def test_cudart_memcpy():
assertSuccess(err)


@skipif_testing_with_compute_sanitizer
def test_cudart_hostRegister():
# Use hostRegister API to check for correct enum return values
page_size = 80
Expand Down
6 changes: 6 additions & 0 deletions cuda_core/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,3 +64,9 @@ def clean_up_cffi_files():
os.remove(f)
except FileNotFoundError:
pass # noqa: SIM105


skipif_testing_with_compute_sanitizer = pytest.mark.skipif(
os.environ.get("CUDA_PYTHON_TESTING_WITH_COMPUTE_SANITIZER", "0") == "1",
reason="The compute-sanitizer is running, and this test causes an API error.",
)
3 changes: 3 additions & 0 deletions cuda_core/tests/test_cuda_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE

import pytest
from conftest import skipif_testing_with_compute_sanitizer

from cuda.bindings import driver, runtime
from cuda.core.experimental._utils import cuda_utils
Expand Down Expand Up @@ -40,6 +41,8 @@ def test_runtime_cuda_error_explanations_health():
assert not extra_expl


# this test causes an API error when the driver is too old to know about all of the error codes
@skipif_testing_with_compute_sanitizer
def test_check_driver_error():
num_unexpected = 0
for error in driver.CUresult:
Expand Down
4 changes: 4 additions & 0 deletions cuda_core/tests/test_event.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

import numpy as np
import pytest
from conftest import skipif_testing_with_compute_sanitizer

import cuda.core.experimental
from cuda.core.experimental import Device, EventOptions, LaunchConfig, Program, ProgramOptions, launch
Expand Down Expand Up @@ -75,6 +76,7 @@ def test_is_done(init_cuda):
assert event.is_done in (True, False)


@skipif_testing_with_compute_sanitizer
def test_error_timing_disabled():
device = Device()
device.set_current()
Expand All @@ -97,6 +99,7 @@ def test_error_timing_disabled():
event2 - event1


@skipif_testing_with_compute_sanitizer
def test_error_timing_recorded():
device = Device()
device.set_current()
Expand All @@ -117,6 +120,7 @@ def test_error_timing_recorded():


# TODO: improve this once path finder can find headers
@skipif_testing_with_compute_sanitizer
@pytest.mark.skipif(os.environ.get("CUDA_PATH") is None, reason="need libcu++ header")
@pytest.mark.skipif(tuple(int(i) for i in np.__version__.split(".")[:2]) < (2, 1), reason="need numpy 2.1.0+")
def test_error_timing_incomplete():
Expand Down
3 changes: 3 additions & 0 deletions cuda_core/tests/test_linker.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE

import pytest
from conftest import skipif_testing_with_compute_sanitizer

from cuda.core.experimental import Device, Linker, LinkerOptions, Program, ProgramOptions, _linker
from cuda.core.experimental._module import ObjectCode
Expand Down Expand Up @@ -140,6 +141,8 @@ def test_linker_link_invalid_target_type(compile_ptx_functions):
linker.link("invalid_target")


# this test causes an API error when using the culink API
@skipif_testing_with_compute_sanitizer
def test_linker_get_error_log(compile_ptx_functions):
options = LinkerOptions(arch=ARCH)

Expand Down