From 147b242a00fa2da4844872050c12958170534b7f Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Wed, 9 Apr 2025 11:33:32 -0700 Subject: [PATCH 01/52] First version of `cuda.bindings.path_finder` (#447) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Unmodified copies of: * https://github.com/NVIDIA/numba-cuda/blob/bf487d78a40eea87f009d636882a5000a7524c95/numba_cuda/numba/cuda/cuda_paths.py * https://github.com/numba/numba/blob/f0d24824fcd6a454827e3c108882395d00befc04/numba/misc/findlib.py * Add Forked from URLs. * Strip down cuda_paths.py to minimum required for `_get_nvvm_path()` Tested interactively with: ``` import cuda_paths nvvm_path = cuda_paths._get_nvvm_path() print(f"{nvvm_path=}") ``` * ruff auto-fixes (NO manual changes) * Make `get_nvvm_path()` a pubic API (i.e. remove leading underscore). * Fetch numba-cuda/numba_cuda/numba/cuda/cuda_paths.py from https://github.com/NVIDIA/numba-cuda/pull/155 AS-IS * ruff format NO MANUAL CHANGES * Minimal changes to adapt numba-cuda/numba_cuda/numba/cuda/cuda_paths.py from https://github.com/NVIDIA/numba-cuda/pull/155 * Rename ecosystem/cuda_paths.py -> path_finder.py * Plug cuda.bindings.path_finder into cuda/bindings/_internal/nvvm_linux.pyx * Plug cuda.bindings.path_finder into cuda/bindings/_internal/nvjitlink_linux.pyx * Fix `os.path.exists(None)` issue: ``` ______________________ ERROR collecting test_nvjitlink.py ______________________ tests/test_nvjitlink.py:62: in not check_nvjitlink_usable(), reason="nvJitLink not usable, maybe not installed or too old (<12.3)" tests/test_nvjitlink.py:58: in check_nvjitlink_usable return inner_nvjitlink._inspect_function_pointer("__nvJitLinkVersion") != 0 cuda/bindings/_internal/nvjitlink.pyx:257: in cuda.bindings._internal.nvjitlink._inspect_function_pointer ??? cuda/bindings/_internal/nvjitlink.pyx:260: in cuda.bindings._internal.nvjitlink._inspect_function_pointer ??? cuda/bindings/_internal/nvjitlink.pyx:208: in cuda.bindings._internal.nvjitlink._inspect_function_pointers ??? cuda/bindings/_internal/nvjitlink.pyx:102: in cuda.bindings._internal.nvjitlink._check_or_init_nvjitlink ??? cuda/bindings/_internal/nvjitlink.pyx:59: in cuda.bindings._internal.nvjitlink.load_library ??? /opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/site-packages/cuda/bindings/path_finder.py:312: in get_cuda_paths "nvvm": _get_nvvm_path(), /opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/site-packages/cuda/bindings/path_finder.py:285: in _get_nvvm_path by, path = _get_nvvm_path_decision() /opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/site-packages/cuda/bindings/path_finder.py:96: in _get_nvvm_path_decision if os.path.exists(nvvm_ctk_dir): :19: in exists ??? E TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType ``` * Fix another `os.path.exists(None)` issue: ``` ______________________ ERROR collecting test_nvjitlink.py ______________________ tests/test_nvjitlink.py:62: in not check_nvjitlink_usable(), reason="nvJitLink not usable, maybe not installed or too old (<12.3)" tests/test_nvjitlink.py:58: in check_nvjitlink_usable return inner_nvjitlink._inspect_function_pointer("__nvJitLinkVersion") != 0 cuda/bindings/_internal/nvjitlink.pyx:257: in cuda.bindings._internal.nvjitlink._inspect_function_pointer ??? cuda/bindings/_internal/nvjitlink.pyx:260: in cuda.bindings._internal.nvjitlink._inspect_function_pointer ??? cuda/bindings/_internal/nvjitlink.pyx:208: in cuda.bindings._internal.nvjitlink._inspect_function_pointers ??? cuda/bindings/_internal/nvjitlink.pyx:102: in cuda.bindings._internal.nvjitlink._check_or_init_nvjitlink ??? cuda/bindings/_internal/nvjitlink.pyx:59: in cuda.bindings._internal.nvjitlink.load_library ??? /opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/site-packages/cuda/bindings/path_finder.py:313: in get_cuda_paths "libdevice": _get_libdevice_paths(), /opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/site-packages/cuda/bindings/path_finder.py:126: in _get_libdevice_paths by, libdir = _get_libdevice_path_decision() /opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/site-packages/cuda/bindings/path_finder.py:73: in _get_libdevice_path_decision if os.path.exists(libdevice_ctk_dir): :19: in exists ??? E TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType ``` * Change "/lib64/" → "/lib/" in nvjitlink_linux.pyx * nvjitlink_linux.pyx load_library() enhancements, mainly to avoid os.path.join(None, "libnvJitLink.so") * Add missing f-string f * Add back get_nvjitlink_dso_version_suffix() call. * pytest -ra -s -v * Rewrite nvjitlink_linux.pyx load_library() to produce detailed error messages. * Attach listdir output to "Unable to load" exception message. * Guard os.listdir() call with os.path.isdir() * Fix logic error in nvjitlink_linux.pyx load_library() * Move path_finder.py to _path_finder_utils/cuda_paths.py, import only public functions from new path_finder.py * Add find_nvidia_dynamic_library() and use from nvjitlink_linux.pyx, nvvm_linux.pyx * Fix oversight in _find_using_lib_dir() * Also look for versioned library in _find_using_nvidia_lib_dirs() * glob.glob() Python 3.9 compatibility * Reduce build-and-test.yml to Windows-only, Python 3.12 only. * Comment out `if: ${{ github.repository_owner == nvidia }}` * Revert "Comment out `if: ${{ github.repository_owner == nvidia }}`" This reverts commit b0db24f9cfa3847e6a3e11c00f0225c7c7ef431e. * Add back `linux-64` `host-platform` * Rewrite load_library() in nvjitlink_windows.pyx to use path_finder.find_nvidia_dynamic_library() * Revert "Rewrite load_library() in nvjitlink_windows.pyx to use path_finder.find_nvidia_dynamic_library()" This reverts commit 1bb71513fea05054779312caac054a09b212b8a7. * Add _inspect_environment() in find_nvidia_dynamic_library.py, call from nvjitlink_windows.pyx, nvvm_windows.pyx * Add & use _find_dll_using_nvidia_bin_dirs(), _find_dll_using_cudalib_dir() * Fix silly oversight: forgot to undo experimental change. * Also reduce test test-linux matrix. * Reimplement load_library() functions in nvjitlink_windows.pyx, nvvm_windows.pyx to actively use path_finder.find_nvidia_dynamic_library() * Factor out load_nvidia_dynamic_library() from _internal/nvjitlink_linux.pyx, nvvm_linux.pyx * Generalize load_nvidia_dynamic_library.py to also work under Windows. * Add `void*` return type to load_library() implementations in _internal/nvjitlink_windows.pyx, nvvm_windows.pyx * Resolve cython error: object handle vs `void*` handle ``` Error compiling Cython file: ------------------------------------------------------------ ... err = (__cuDriverGetVersion)(&driver_ver) if err != 0: raise RuntimeError('something went wrong') # Load library handle = load_library(driver_ver) ^ ------------------------------------------------------------ cuda\bindings\_internal\nvjitlink.pyx:72:29: Cannot convert 'void *' to Python object ``` * Resolve another cython error: `void*` handle vs `intptr_t` handle ``` Error compiling Cython file: ------------------------------------------------------------ ... handle = load_library(driver_ver) # Load function global __nvJitLinkCreate try: __nvJitLinkCreate = win32api.GetProcAddress(handle, 'nvJitLinkCreate') ^ ------------------------------------------------------------ cuda\bindings\_internal\nvjitlink.pyx:78:73: Cannot convert 'void *' to Python object ``` * Resolve signed/unsigned runtime error. Use uintptr_t consistently. https://github.com/NVIDIA/cuda-python/actions/runs/14224673173/job/39861750852?pr=447#logs ``` =================================== ERRORS ==================================== _____________________ ERROR collecting test_nvjitlink.py ______________________ tests\test_nvjitlink.py:62: in not check_nvjitlink_usable(), reason="nvJitLink not usable, maybe not installed or too old (<12.3)" tests\test_nvjitlink.py:58: in check_nvjitlink_usable return inner_nvjitlink._inspect_function_pointer("__nvJitLinkVersion") != 0 cuda\\bindings\\_internal\\nvjitlink.pyx:221: in cuda.bindings._internal.nvjitlink._inspect_function_pointer ??? cuda\\bindings\\_internal\\nvjitlink.pyx:224: in cuda.bindings._internal.nvjitlink._inspect_function_pointer ??? cuda\\bindings\\_internal\\nvjitlink.pyx:172: in cuda.bindings._internal.nvjitlink._inspect_function_pointers ??? cuda\\bindings\\_internal\\nvjitlink.pyx:73: in cuda.bindings._internal.nvjitlink._check_or_init_nvjitlink ??? cuda\\bindings\\_internal\\nvjitlink.pyx:46: in cuda.bindings._internal.nvjitlink.load_library ??? E OverflowError: can't convert negative value to size_t ``` * Change win32api.GetProcAddress` back to `intptr_t`. Changing load_nvidia_dynamic_library() to also use to-`intptr_t` conversion, for compatibility with win32api.GetProcAddress. Document that CDLL behaves differently (it uses to-`uintptr_t`). * Use win32api.LoadLibrary() instead of ctypes.windll.kernel32.LoadLibraryW(), to be more similar to original (and working) cython code. Hoping to resolve this kind of error: ``` _ ERROR at setup of test_c_or_v_program_fail_bad_option[txt-compile_program] __ request = > @pytest.fixture(params=MINIMAL_NVVMIR_FIXTURE_PARAMS) def minimal_nvvmir(request): for pass_counter in range(2): nvvmir = MINIMAL_NVVMIR_CACHE.get(request.param, -1) if nvvmir != -1: if nvvmir is None: pytest.skip(f"UNAVAILABLE: {request.param}") return nvvmir if pass_counter: raise AssertionError("This code path is meant to be unreachable.") # Build cache entries, then try again (above). > major, minor, debug_major, debug_minor = nvvm.ir_version() tests\test_nvvm.py:148: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ cuda\bindings\nvvm.pyx:95: in cuda.bindings.nvvm.ir_version cpdef tuple ir_version(): cuda\bindings\nvvm.pyx:113: in cuda.bindings.nvvm.ir_version status = nvvmIRVersion(&major_ir, &minor_ir, &major_dbg, &minor_dbg) cuda\bindings\cynvvm.pyx:19: in cuda.bindings.cynvvm.nvvmIRVersion return _nvvm._nvvmIRVersion(majorIR, minorIR, majorDbg, minorDbg) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E cuda.bindings._internal.utils.FunctionNotFoundError: function nvvmIRVersion is not found ``` * Remove debug print statements. * Remove some cruft. * Trivial renaming of variables. No functional changes. * Revert debug changes under .github/workflows * Rename _path_finder_utils → _path_finder * Remove LD_LIBRARY_PATH in fetch_ctk/action.yml * Linux: First try using the platform-specific dynamic loader search mechanisms * Add _windows_load_with_dll_basename() * Revert "Revert debug changes under .github/workflows" This reverts commit cc6113cce20c5c6124d0676daeccb7db2fffd798. * Add debug prints in load_nvidia_dynamic_library() * Report dlopen error for libnvrtc.so.12 * print("\nLOOOK dlfcn.dlopen('libnvrtc.so.12', dlfcn.RTLD_NOW)", flush=True) * Revert "Remove LD_LIBRARY_PATH in fetch_ctk/action.yml" This reverts commit 1b1139cda8b56f2fa37c5c0102ee7fe6b5963cab. * Only remove ${CUDA_PATH}/nvvm/lib64 from LD_LIBRARY_PATH * Use path_finder.load_nvidia_dynamic_library("nvrtc") from cuda/bindings/_bindings/cynvrtc.pyx.in * Somewhat ad hoc heuristics for nvidia_cuda_nvrtc wheels. * Remove LD_LIBRARY_PATH entirely from .github/actions/fetch_ctk/action.yml * Remove CUDA_PATH\nvvm\bin in .github/workflows/test-wheel-windows.yml * Revert "Remove LD_LIBRARY_PATH entirely from .github/actions/fetch_ctk/action.yml" This reverts commit bff8cf023c82c7456af79ef004ba1c30d16b974a. * Revert "Somewhat ad hoc heuristics for nvidia_cuda_nvrtc wheels." This reverts commit 43abec8666a920e56ddc90cdb880ead248d0e45b. * Restore cuda/bindings/_bindings/cynvrtc.pyx.in as-is on main * Remove debug print from load_nvidia_dynamic_library.py * Reapply "Revert debug changes under .github/workflows" This reverts commit aaa6aff637f6bd076d0b124a39d56eeab5875351. --- .github/actions/fetch_ctk/action.yml | 2 +- .github/workflows/test-wheel-windows.yml | 7 - .../bindings/_internal/nvjitlink_linux.pyx | 20 +- .../bindings/_internal/nvjitlink_windows.pyx | 63 +-- .../cuda/bindings/_internal/nvvm_linux.pyx | 18 +- .../cuda/bindings/_internal/nvvm_windows.pyx | 63 +-- .../cuda/bindings/_internal/utils.pxd | 3 - .../cuda/bindings/_internal/utils.pyx | 14 - .../cuda/bindings/_path_finder/cuda_paths.py | 403 ++++++++++++++++++ .../find_nvidia_dynamic_library.py | 139 ++++++ .../cuda/bindings/_path_finder/findlib.py | 69 +++ .../load_nvidia_dynamic_library.py | 92 ++++ .../_path_finder/sys_path_find_sub_dirs.py | 40 ++ cuda_bindings/cuda/bindings/path_finder.py | 37 ++ cuda_bindings/tests/path_finder.py | 9 + .../tests/test_sys_path_find_sub_dirs.py | 72 ++++ 16 files changed, 889 insertions(+), 162 deletions(-) create mode 100644 cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py create mode 100644 cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py create mode 100644 cuda_bindings/cuda/bindings/_path_finder/findlib.py create mode 100644 cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py create mode 100644 cuda_bindings/cuda/bindings/_path_finder/sys_path_find_sub_dirs.py create mode 100644 cuda_bindings/cuda/bindings/path_finder.py create mode 100644 cuda_bindings/tests/path_finder.py create mode 100644 cuda_bindings/tests/test_sys_path_find_sub_dirs.py diff --git a/.github/actions/fetch_ctk/action.yml b/.github/actions/fetch_ctk/action.yml index 669943296..5850b4c78 100644 --- a/.github/actions/fetch_ctk/action.yml +++ b/.github/actions/fetch_ctk/action.yml @@ -123,4 +123,4 @@ runs: echo "CUDA_PATH=${CUDA_PATH}" >> $GITHUB_ENV echo "CUDA_HOME=${CUDA_PATH}" >> $GITHUB_ENV echo "${CUDA_PATH}/bin" >> $GITHUB_PATH - echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH:-}:${CUDA_PATH}/lib:${CUDA_PATH}/nvvm/lib64" >> $GITHUB_ENV + echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH:-}:${CUDA_PATH}/lib" >> $GITHUB_ENV diff --git a/.github/workflows/test-wheel-windows.yml b/.github/workflows/test-wheel-windows.yml index 4e48590a3..948d2fae6 100644 --- a/.github/workflows/test-wheel-windows.yml +++ b/.github/workflows/test-wheel-windows.yml @@ -164,13 +164,6 @@ jobs: method: 'network' sub-packages: ${{ env.MINI_CTK_DEPS }} - - name: Update PATH - if: ${{ inputs.local-ctk == '1' }} - run: | - # mimics actual CTK installation - echo $PATH - echo "$env:CUDA_PATH\nvvm\bin" >> $env:GITHUB_PATH - - name: Run cuda.bindings tests if: ${{ env.SKIP_CUDA_BINDINGS_TEST == '0' }} run: | diff --git a/cuda_bindings/cuda/bindings/_internal/nvjitlink_linux.pyx b/cuda_bindings/cuda/bindings/_internal/nvjitlink_linux.pyx index 9961a2105..9d21a3e10 100644 --- a/cuda_bindings/cuda/bindings/_internal/nvjitlink_linux.pyx +++ b/cuda_bindings/cuda/bindings/_internal/nvjitlink_linux.pyx @@ -4,12 +4,12 @@ # # This code was automatically generated across versions from 12.0.1 to 12.8.0. Do not modify it directly. -from libc.stdint cimport intptr_t - -from .utils cimport get_nvjitlink_dso_version_suffix +from libc.stdint cimport intptr_t, uintptr_t from .utils import FunctionNotFoundError, NotSupportedError +from cuda.bindings import path_finder + ############################################################################### # Extern ############################################################################### @@ -52,17 +52,9 @@ cdef void* __nvJitLinkGetInfoLog = NULL cdef void* __nvJitLinkVersion = NULL -cdef void* load_library(const int driver_ver) except* with gil: - cdef void* handle - for suffix in get_nvjitlink_dso_version_suffix(driver_ver): - so_name = "libnvJitLink.so" + (f".{suffix}" if suffix else suffix) - handle = dlopen(so_name.encode(), RTLD_NOW | RTLD_GLOBAL) - if handle != NULL: - break - else: - err_msg = dlerror() - raise RuntimeError(f'Failed to dlopen libnvJitLink ({err_msg.decode()})') - return handle +cdef void* load_library(int driver_ver) except* with gil: + cdef uintptr_t handle = path_finder.load_nvidia_dynamic_library("nvJitLink") + return handle cdef int _check_or_init_nvjitlink() except -1 nogil: diff --git a/cuda_bindings/cuda/bindings/_internal/nvjitlink_windows.pyx b/cuda_bindings/cuda/bindings/_internal/nvjitlink_windows.pyx index c8c7e6b29..f86972216 100644 --- a/cuda_bindings/cuda/bindings/_internal/nvjitlink_windows.pyx +++ b/cuda_bindings/cuda/bindings/_internal/nvjitlink_windows.pyx @@ -6,12 +6,9 @@ from libc.stdint cimport intptr_t -from .utils cimport get_nvjitlink_dso_version_suffix - from .utils import FunctionNotFoundError, NotSupportedError -import os -import site +from cuda.bindings import path_finder import win32api @@ -42,54 +39,9 @@ cdef void* __nvJitLinkGetInfoLog = NULL cdef void* __nvJitLinkVersion = NULL -cdef inline list get_site_packages(): - return [site.getusersitepackages()] + site.getsitepackages() - - -cdef load_library(const int driver_ver): - handle = 0 - - for suffix in get_nvjitlink_dso_version_suffix(driver_ver): - if len(suffix) == 0: - continue - dll_name = f"nvJitLink_{suffix}0_0.dll" - - # First check if the DLL has been loaded by 3rd parties - try: - handle = win32api.GetModuleHandle(dll_name) - except: - pass - else: - break - - # Next, check if DLLs are installed via pip - for sp in get_site_packages(): - mod_path = os.path.join(sp, "nvidia", "nvJitLink", "bin") - if not os.path.isdir(mod_path): - continue - os.add_dll_directory(mod_path) - try: - handle = win32api.LoadLibraryEx( - # Note: LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR needs an abs path... - os.path.join(mod_path, dll_name), - 0, LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR) - except: - pass - else: - break - - # Finally, try default search - try: - handle = win32api.LoadLibrary(dll_name) - except: - pass - else: - break - else: - raise RuntimeError('Failed to load nvJitLink') - - assert handle != 0 - return handle +cdef void* load_library(int driver_ver) except* with gil: + cdef intptr_t handle = path_finder.load_nvidia_dynamic_library("nvJitLink") + return handle cdef int _check_or_init_nvjitlink() except -1 nogil: @@ -98,15 +50,16 @@ cdef int _check_or_init_nvjitlink() except -1 nogil: return 0 cdef int err, driver_ver + cdef intptr_t handle with gil: # Load driver to check version try: - handle = win32api.LoadLibraryEx("nvcuda.dll", 0, LOAD_LIBRARY_SEARCH_SYSTEM32) + nvcuda_handle = win32api.LoadLibraryEx("nvcuda.dll", 0, LOAD_LIBRARY_SEARCH_SYSTEM32) except Exception as e: raise NotSupportedError(f'CUDA driver is not found ({e})') global __cuDriverGetVersion if __cuDriverGetVersion == NULL: - __cuDriverGetVersion = win32api.GetProcAddress(handle, 'cuDriverGetVersion') + __cuDriverGetVersion = win32api.GetProcAddress(nvcuda_handle, 'cuDriverGetVersion') if __cuDriverGetVersion == NULL: raise RuntimeError('something went wrong') err = (__cuDriverGetVersion)(&driver_ver) @@ -114,7 +67,7 @@ cdef int _check_or_init_nvjitlink() except -1 nogil: raise RuntimeError('something went wrong') # Load library - handle = load_library(driver_ver) + handle = load_library(driver_ver) # Load function global __nvJitLinkCreate diff --git a/cuda_bindings/cuda/bindings/_internal/nvvm_linux.pyx b/cuda_bindings/cuda/bindings/_internal/nvvm_linux.pyx index 64e78e75a..33ba8e610 100644 --- a/cuda_bindings/cuda/bindings/_internal/nvvm_linux.pyx +++ b/cuda_bindings/cuda/bindings/_internal/nvvm_linux.pyx @@ -4,12 +4,12 @@ # # This code was automatically generated across versions from 11.0.3 to 12.8.0. Do not modify it directly. -from libc.stdint cimport intptr_t - -from .utils cimport get_nvvm_dso_version_suffix +from libc.stdint cimport intptr_t, uintptr_t from .utils import FunctionNotFoundError, NotSupportedError +from cuda.bindings import path_finder + ############################################################################### # Extern ############################################################################### @@ -51,16 +51,8 @@ cdef void* __nvvmGetProgramLog = NULL cdef void* load_library(const int driver_ver) except* with gil: - cdef void* handle - for suffix in get_nvvm_dso_version_suffix(driver_ver): - so_name = "libnvvm.so" + (f".{suffix}" if suffix else suffix) - handle = dlopen(so_name.encode(), RTLD_NOW | RTLD_GLOBAL) - if handle != NULL: - break - else: - err_msg = dlerror() - raise RuntimeError(f'Failed to dlopen libnvvm ({err_msg.decode()})') - return handle + cdef uintptr_t handle = path_finder.load_nvidia_dynamic_library("nvvm") + return handle cdef int _check_or_init_nvvm() except -1 nogil: diff --git a/cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx b/cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx index 76ce23254..6349fa5a1 100644 --- a/cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx +++ b/cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx @@ -6,12 +6,9 @@ from libc.stdint cimport intptr_t -from .utils cimport get_nvvm_dso_version_suffix - from .utils import FunctionNotFoundError, NotSupportedError -import os -import site +from cuda.bindings import path_finder import win32api @@ -40,54 +37,9 @@ cdef void* __nvvmGetProgramLogSize = NULL cdef void* __nvvmGetProgramLog = NULL -cdef inline list get_site_packages(): - return [site.getusersitepackages()] + site.getsitepackages() - - -cdef load_library(const int driver_ver): - handle = 0 - - for suffix in get_nvvm_dso_version_suffix(driver_ver): - if len(suffix) == 0: - continue - dll_name = "nvvm64_40_0" - - # First check if the DLL has been loaded by 3rd parties - try: - handle = win32api.GetModuleHandle(dll_name) - except: - pass - else: - break - - # Next, check if DLLs are installed via pip - for sp in get_site_packages(): - mod_path = os.path.join(sp, "nvidia", "cuda_nvcc", "nvvm", "bin") - if not os.path.isdir(mod_path): - continue - os.add_dll_directory(mod_path) - try: - handle = win32api.LoadLibraryEx( - # Note: LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR needs an abs path... - os.path.join(mod_path, dll_name), - 0, LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR) - except: - pass - else: - break - - # Finally, try default search - try: - handle = win32api.LoadLibrary(dll_name) - except: - pass - else: - break - else: - raise RuntimeError('Failed to load nvvm') - - assert handle != 0 - return handle +cdef void* load_library(int driver_ver) except* with gil: + cdef intptr_t handle = path_finder.load_nvidia_dynamic_library("nvvm") + return handle cdef int _check_or_init_nvvm() except -1 nogil: @@ -96,15 +48,16 @@ cdef int _check_or_init_nvvm() except -1 nogil: return 0 cdef int err, driver_ver + cdef intptr_t handle with gil: # Load driver to check version try: - handle = win32api.LoadLibraryEx("nvcuda.dll", 0, LOAD_LIBRARY_SEARCH_SYSTEM32) + nvcuda_handle = win32api.LoadLibraryEx("nvcuda.dll", 0, LOAD_LIBRARY_SEARCH_SYSTEM32) except Exception as e: raise NotSupportedError(f'CUDA driver is not found ({e})') global __cuDriverGetVersion if __cuDriverGetVersion == NULL: - __cuDriverGetVersion = win32api.GetProcAddress(handle, 'cuDriverGetVersion') + __cuDriverGetVersion = win32api.GetProcAddress(nvcuda_handle, 'cuDriverGetVersion') if __cuDriverGetVersion == NULL: raise RuntimeError('something went wrong') err = (__cuDriverGetVersion)(&driver_ver) @@ -112,7 +65,7 @@ cdef int _check_or_init_nvvm() except -1 nogil: raise RuntimeError('something went wrong') # Load library - handle = load_library(driver_ver) + handle = load_library(driver_ver) # Load function global __nvvmVersion diff --git a/cuda_bindings/cuda/bindings/_internal/utils.pxd b/cuda_bindings/cuda/bindings/_internal/utils.pxd index cac7846ff..a4b71c531 100644 --- a/cuda_bindings/cuda/bindings/_internal/utils.pxd +++ b/cuda_bindings/cuda/bindings/_internal/utils.pxd @@ -165,6 +165,3 @@ cdef int get_nested_resource_ptr(nested_resource[ResT] &in_out_ptr, object obj, cdef bint is_nested_sequence(data) cdef void* get_buffer_pointer(buf, Py_ssize_t size, readonly=*) except* - -cdef tuple get_nvjitlink_dso_version_suffix(int driver_ver) -cdef tuple get_nvvm_dso_version_suffix(int driver_ver) diff --git a/cuda_bindings/cuda/bindings/_internal/utils.pyx b/cuda_bindings/cuda/bindings/_internal/utils.pyx index 0a693c052..7fc77b22c 100644 --- a/cuda_bindings/cuda/bindings/_internal/utils.pyx +++ b/cuda_bindings/cuda/bindings/_internal/utils.pyx @@ -127,17 +127,3 @@ cdef int get_nested_resource_ptr(nested_resource[ResT] &in_out_ptr, object obj, class FunctionNotFoundError(RuntimeError): pass class NotSupportedError(RuntimeError): pass - - -cdef tuple get_nvjitlink_dso_version_suffix(int driver_ver): - if 12000 <= driver_ver < 13000: - return ('12', '') - raise NotSupportedError(f'CUDA driver version {driver_ver} is not supported') - - -cdef tuple get_nvvm_dso_version_suffix(int driver_ver): - if 11000 <= driver_ver < 11020: - return ('3', '') - if 11020 <= driver_ver < 13000: - return ('4', '') - raise NotSupportedError(f'CUDA driver version {driver_ver} is not supported') diff --git a/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py b/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py new file mode 100644 index 000000000..e27e6f54b --- /dev/null +++ b/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py @@ -0,0 +1,403 @@ +import os +import platform +import re +import site +import sys +import traceback +import warnings +from collections import namedtuple +from pathlib import Path + +from .findlib import find_file, find_lib + +IS_WIN32 = sys.platform.startswith("win32") + +_env_path_tuple = namedtuple("_env_path_tuple", ["by", "info"]) + + +def _get_numba_CUDA_INCLUDE_PATH(): + # From numba/numba/core/config.py + + def _readenv(name, ctor, default): + value = os.environ.get(name) + if value is None: + return default() if callable(default) else default + try: + return ctor(value) + except Exception: + warnings.warn( # noqa: B028 + f"Environment variable '{name}' is defined but " + f"its associated value '{value}' could not be " + "parsed.\nThe parse failed with exception:\n" + f"{traceback.format_exc()}", + RuntimeWarning, + ) + return default + + if IS_WIN32: + cuda_path = os.environ.get("CUDA_PATH") + if cuda_path: # noqa: SIM108 + default_cuda_include_path = os.path.join(cuda_path, "include") + else: + default_cuda_include_path = "cuda_include_not_found" + else: + default_cuda_include_path = os.path.join(os.sep, "usr", "local", "cuda", "include") + CUDA_INCLUDE_PATH = _readenv("NUMBA_CUDA_INCLUDE_PATH", str, default_cuda_include_path) + return CUDA_INCLUDE_PATH + + +config_CUDA_INCLUDE_PATH = _get_numba_CUDA_INCLUDE_PATH() + + +def _find_valid_path(options): + """Find valid path from *options*, which is a list of 2-tuple of + (name, path). Return first pair where *path* is not None. + If no valid path is found, return ('', None) + """ + for by, data in options: + if data is not None: + return by, data + else: + return "", None + + +def _get_libdevice_path_decision(): + options = [ + ("Conda environment", get_conda_ctk()), + ("Conda environment (NVIDIA package)", get_nvidia_libdevice_ctk()), + ("CUDA_HOME", get_cuda_home("nvvm", "libdevice")), + ("Debian package", get_debian_pkg_libdevice()), + ("NVIDIA NVCC Wheel", get_libdevice_wheel()), + ] + libdevice_ctk_dir = get_system_ctk("nvvm", "libdevice") + if libdevice_ctk_dir and os.path.exists(libdevice_ctk_dir): + options.append(("System", libdevice_ctk_dir)) + + by, libdir = _find_valid_path(options) + return by, libdir + + +def _nvvm_lib_dir(): + if IS_WIN32: + return "nvvm", "bin" + else: + return "nvvm", "lib64" + + +def _get_nvvm_path_decision(): + options = [ + ("Conda environment", get_conda_ctk()), + ("Conda environment (NVIDIA package)", get_nvidia_nvvm_ctk()), + ("CUDA_HOME", get_cuda_home(*_nvvm_lib_dir())), + ("NVIDIA NVCC Wheel", _get_nvvm_wheel()), + ] + # need to ensure nvvm dir actually exists + nvvm_ctk_dir = get_system_ctk(*_nvvm_lib_dir()) + if nvvm_ctk_dir and os.path.exists(nvvm_ctk_dir): + options.append(("System", nvvm_ctk_dir)) + + by, path = _find_valid_path(options) + return by, path + + +def _get_nvvm_wheel(): + site_paths = [site.getusersitepackages()] + site.getsitepackages() + ["conda", None] + for sp in site_paths: + # The SONAME is taken based on public CTK 12.x releases + if sys.platform.startswith("linux"): + dso_dir = "lib64" + # Hack: libnvvm from Linux wheel + # does not have any soname (CUDAINST-3183) + dso_path = "libnvvm.so" + elif sys.platform.startswith("win32"): + dso_dir = "bin" + dso_path = "nvvm64_40_0.dll" + else: + raise AssertionError() + + if sp is not None: + dso_dir = os.path.join(sp, "nvidia", "cuda_nvcc", "nvvm", dso_dir) + dso_path = os.path.join(dso_dir, dso_path) + if os.path.exists(dso_path): + return str(Path(dso_path).parent) + + +def _get_libdevice_paths(): + by, libdir = _get_libdevice_path_decision() + if by == "NVIDIA NVCC Wheel": + # The NVVM path is a directory, not a file + out = os.path.join(libdir, "libdevice.10.bc") + else: + # Search for pattern + pat = r"libdevice(\.\d+)*\.bc$" + candidates = find_file(re.compile(pat), libdir) + # Keep only the max (most recent version) of the bitcode files. + out = max(candidates, default=None) + return _env_path_tuple(by, out) + + +def _cudalib_path(): + if IS_WIN32: + return "bin" + else: + return "lib64" + + +def _cuda_home_static_cudalib_path(): + if IS_WIN32: + return ("lib", "x64") + else: + return ("lib64",) + + +def _get_cudalib_dir_path_decision(): + options = [ + ("Conda environment", get_conda_ctk()), + ("Conda environment (NVIDIA package)", get_nvidia_cudalib_ctk()), + ("CUDA_HOME", get_cuda_home(_cudalib_path())), + ("System", get_system_ctk(_cudalib_path())), + ] + by, libdir = _find_valid_path(options) + return by, libdir + + +def _get_static_cudalib_dir_path_decision(): + options = [ + ("Conda environment", get_conda_ctk()), + ("Conda environment (NVIDIA package)", get_nvidia_static_cudalib_ctk()), + ("CUDA_HOME", get_cuda_home(*_cuda_home_static_cudalib_path())), + ("System", get_system_ctk(_cudalib_path())), + ] + by, libdir = _find_valid_path(options) + return by, libdir + + +def _get_cudalib_dir(): + by, libdir = _get_cudalib_dir_path_decision() + return _env_path_tuple(by, libdir) + + +def _get_static_cudalib_dir(): + by, libdir = _get_static_cudalib_dir_path_decision() + return _env_path_tuple(by, libdir) + + +def get_system_ctk(*subdirs): + """Return path to system-wide cudatoolkit; or, None if it doesn't exist.""" + # Linux? + if sys.platform.startswith("linux"): + # Is cuda alias to /usr/local/cuda? + # We are intentionally not getting versioned cuda installation. + base = "/usr/local/cuda" + if os.path.exists(base): + return os.path.join(base, *subdirs) + + +def get_conda_ctk(): + """Return path to directory containing the shared libraries of cudatoolkit.""" + is_conda_env = os.path.exists(os.path.join(sys.prefix, "conda-meta")) + if not is_conda_env: + return + # Assume the existence of NVVM to imply cudatoolkit installed + paths = find_lib("nvvm") + if not paths: + return + # Use the directory name of the max path + return os.path.dirname(max(paths)) + + +def get_nvidia_nvvm_ctk(): + """Return path to directory containing the NVVM shared library.""" + is_conda_env = os.path.exists(os.path.join(sys.prefix, "conda-meta")) + if not is_conda_env: + return + + # Assume the existence of NVVM in the conda env implies that a CUDA toolkit + # conda package is installed. + + # First, try the location used on Linux and the Windows 11.x packages + libdir = os.path.join(sys.prefix, "nvvm", _cudalib_path()) + if not os.path.exists(libdir) or not os.path.isdir(libdir): + # If that fails, try the location used for Windows 12.x packages + libdir = os.path.join(sys.prefix, "Library", "nvvm", _cudalib_path()) + if not os.path.exists(libdir) or not os.path.isdir(libdir): + # If that doesn't exist either, assume we don't have the NVIDIA + # conda package + return + + paths = find_lib("nvvm", libdir=libdir) + if not paths: + return + # Use the directory name of the max path + return os.path.dirname(max(paths)) + + +def get_nvidia_libdevice_ctk(): + """Return path to directory containing the libdevice library.""" + nvvm_ctk = get_nvidia_nvvm_ctk() + if not nvvm_ctk: + return + nvvm_dir = os.path.dirname(nvvm_ctk) + return os.path.join(nvvm_dir, "libdevice") + + +def get_nvidia_cudalib_ctk(): + """Return path to directory containing the shared libraries of cudatoolkit.""" + nvvm_ctk = get_nvidia_nvvm_ctk() + if not nvvm_ctk: + return + env_dir = os.path.dirname(os.path.dirname(nvvm_ctk)) + subdir = "bin" if IS_WIN32 else "lib" + return os.path.join(env_dir, subdir) + + +def get_nvidia_static_cudalib_ctk(): + """Return path to directory containing the static libraries of cudatoolkit.""" + nvvm_ctk = get_nvidia_nvvm_ctk() + if not nvvm_ctk: + return + + if IS_WIN32 and ("Library" not in nvvm_ctk): # noqa: SIM108 + # Location specific to CUDA 11.x packages on Windows + dirs = ("Lib", "x64") + else: + # Linux, or Windows with CUDA 12.x packages + dirs = ("lib",) + + env_dir = os.path.dirname(os.path.dirname(nvvm_ctk)) + return os.path.join(env_dir, *dirs) + + +def get_cuda_home(*subdirs): + """Get paths of CUDA_HOME. + If *subdirs* are the subdirectory name to be appended in the resulting + path. + """ + cuda_home = os.environ.get("CUDA_HOME") + if cuda_home is None: + # Try Windows CUDA installation without Anaconda + cuda_home = os.environ.get("CUDA_PATH") + if cuda_home is not None: + return os.path.join(cuda_home, *subdirs) + + +def _get_nvvm_path(): + by, path = _get_nvvm_path_decision() + if by == "NVIDIA NVCC Wheel": + # The NVVM path is a directory, not a file + path = os.path.join(path, "libnvvm.so") + else: + candidates = find_lib("nvvm", path) + path = max(candidates) if candidates else None + return _env_path_tuple(by, path) + + +def get_cuda_paths(): + """Returns a dictionary mapping component names to a 2-tuple + of (source_variable, info). + + The returned dictionary will have the following keys and infos: + - "nvvm": file_path + - "libdevice": List[Tuple[arch, file_path]] + - "cudalib_dir": directory_path + + Note: The result of the function is cached. + """ + # Check cache + if hasattr(get_cuda_paths, "_cached_result"): + return get_cuda_paths._cached_result + else: + # Not in cache + d = { + "nvvm": _get_nvvm_path(), + "libdevice": _get_libdevice_paths(), + "cudalib_dir": _get_cudalib_dir(), + "static_cudalib_dir": _get_static_cudalib_dir(), + "include_dir": _get_include_dir(), + } + # Cache result + get_cuda_paths._cached_result = d + return d + + +def get_debian_pkg_libdevice(): + """ + Return the Debian NVIDIA Maintainers-packaged libdevice location, if it + exists. + """ + pkg_libdevice_location = "/usr/lib/nvidia-cuda-toolkit/libdevice" + if not os.path.exists(pkg_libdevice_location): + return None + return pkg_libdevice_location + + +def get_libdevice_wheel(): + nvvm_path = _get_nvvm_wheel() + if nvvm_path is None: + return None + nvvm_path = Path(nvvm_path) + libdevice_path = nvvm_path.parent / "libdevice" + + return str(libdevice_path) + + +def get_current_cuda_target_name(): + """Determine conda's CTK target folder based on system and machine arch. + + CTK's conda package delivers headers based on its architecture type. For example, + `x86_64` machine places header under `$CONDA_PREFIX/targets/x86_64-linux`, and + `aarch64` places under `$CONDA_PREFIX/targets/sbsa-linux`. Read more about the + nuances at cudart's conda feedstock: + https://github.com/conda-forge/cuda-cudart-feedstock/blob/main/recipe/meta.yaml#L8-L11 # noqa: E501 + """ + system = platform.system() + machine = platform.machine() + + if system == "Linux": + arch_to_targets = {"x86_64": "x86_64-linux", "aarch64": "sbsa-linux"} + elif system == "Windows": + arch_to_targets = { + "AMD64": "x64", + } + else: + arch_to_targets = {} + + return arch_to_targets.get(machine, None) + + +def get_conda_include_dir(): + """ + Return the include directory in the current conda environment, if one + is active and it exists. + """ + is_conda_env = os.path.exists(os.path.join(sys.prefix, "conda-meta")) + if not is_conda_env: + return + + if platform.system() == "Windows": + include_dir = os.path.join(sys.prefix, "Library", "include") + elif target_name := get_current_cuda_target_name(): + include_dir = os.path.join(sys.prefix, "targets", target_name, "include") + else: + # A fallback when target cannot determined + # though usually it shouldn't. + include_dir = os.path.join(sys.prefix, "include") + + if ( + os.path.exists(include_dir) + and os.path.isdir(include_dir) + and os.path.exists(os.path.join(include_dir, "cuda_device_runtime_api.h")) + ): + return include_dir + return + + +def _get_include_dir(): + """Find the root include directory.""" + options = [ + ("Conda environment (NVIDIA package)", get_conda_include_dir()), + ("CUDA_INCLUDE_PATH Config Entry", config_CUDA_INCLUDE_PATH), + # TODO: add others + ] + by, include_dir = _find_valid_path(options) + return _env_path_tuple(by, include_dir) diff --git a/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py b/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py new file mode 100644 index 000000000..30a9b68f4 --- /dev/null +++ b/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py @@ -0,0 +1,139 @@ +# Copyright 2024-2025 NVIDIA Corporation. All rights reserved. +# +# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE + +import functools +import glob +import os + +from .cuda_paths import IS_WIN32, get_cuda_paths +from .sys_path_find_sub_dirs import sys_path_find_sub_dirs + + +def _no_such_file_in_sub_dirs(sub_dirs, file_wild, error_messages, attachments): + error_messages.append(f"No such file: {file_wild}") + for sub_dir in sys_path_find_sub_dirs(sub_dirs): + attachments.append(f' listdir("{sub_dir}"):') + for node in sorted(os.listdir(sub_dir)): + attachments.append(f" {node}") + + +def _find_so_using_nvidia_lib_dirs(libname, so_basename, error_messages, attachments): + if libname == "nvvm": # noqa: SIM108 + nvidia_sub_dirs = ("nvidia", "*", "nvvm", "lib64") + else: + nvidia_sub_dirs = ("nvidia", "*", "lib") + file_wild = so_basename + "*" + for lib_dir in sys_path_find_sub_dirs(nvidia_sub_dirs): + # First look for an exact match + so_name = os.path.join(lib_dir, so_basename) + if os.path.isfile(so_name): + return so_name + # Look for a versioned library + # Using sort here mainly to make the result deterministic. + for node in sorted(glob.glob(os.path.join(lib_dir, file_wild))): + so_name = os.path.join(lib_dir, node) + if os.path.isfile(so_name): + return so_name + _no_such_file_in_sub_dirs(nvidia_sub_dirs, file_wild, error_messages, attachments) + return None + + +def _find_dll_using_nvidia_bin_dirs(libname, error_messages, attachments): + if libname == "nvvm": # noqa: SIM108 + nvidia_sub_dirs = ("nvidia", "*", "nvvm", "bin") + else: + nvidia_sub_dirs = ("nvidia", "*", "bin") + file_wild = libname + "*.dll" + for bin_dir in sys_path_find_sub_dirs(nvidia_sub_dirs): + for node in sorted(glob.glob(os.path.join(bin_dir, file_wild))): + dll_name = os.path.join(bin_dir, node) + if os.path.isfile(dll_name): + return dll_name + _no_such_file_in_sub_dirs(nvidia_sub_dirs, file_wild, error_messages, attachments) + return None + + +def _get_cuda_paths_info(key, error_messages): + env_path_tuple = get_cuda_paths()[key] + if not env_path_tuple: + error_messages.append(f'Failure obtaining get_cuda_paths()["{key}"]') + return None + if not env_path_tuple.info: + error_messages.append(f'Failure obtaining get_cuda_paths()["{key}"].info') + return None + return env_path_tuple.info + + +def _find_so_using_cudalib_dir(so_basename, error_messages, attachments): + cudalib_dir = _get_cuda_paths_info("cudalib_dir", error_messages) + if cudalib_dir is None: + return None + primary_so_dir = cudalib_dir + "/" + candidate_so_dirs = [primary_so_dir] + libs = ["/lib/", "/lib64/"] + for _ in range(2): + alt_dir = libs[0].join(primary_so_dir.rsplit(libs[1], 1)) + if alt_dir not in candidate_so_dirs: + candidate_so_dirs.append(alt_dir) + libs.reverse() + candidate_so_names = [so_dirname + so_basename for so_dirname in candidate_so_dirs] + error_messages = [] + for so_name in candidate_so_names: + if os.path.isfile(so_name): + return so_name + error_messages.append(f"No such file: {so_name}") + for so_dirname in candidate_so_dirs: + attachments.append(f' listdir("{so_dirname}"):') + if not os.path.isdir(so_dirname): + attachments.append(" DIRECTORY DOES NOT EXIST") + else: + for node in sorted(os.listdir(so_dirname)): + attachments.append(f" {node}") + return None + + +def _find_dll_using_cudalib_dir(libname, error_messages, attachments): + cudalib_dir = _get_cuda_paths_info("cudalib_dir", error_messages) + if cudalib_dir is None: + return None + file_wild = libname + "*.dll" + for node in sorted(glob.glob(os.path.join(cudalib_dir, file_wild))): + dll_name = os.path.join(cudalib_dir, node) + if os.path.isfile(dll_name): + return dll_name + error_messages.append(f"No such file: {file_wild}") + attachments.append(f' listdir("{cudalib_dir}"):') + for node in sorted(os.listdir(cudalib_dir)): + attachments.append(f" {node}") + return None + + +@functools.cache +def find_nvidia_dynamic_library(name: str) -> str: + error_messages = [] + attachments = [] + + if IS_WIN32: + dll_name = _find_dll_using_nvidia_bin_dirs(name, error_messages, attachments) + if dll_name is None: + if name == "nvvm": + dll_name = _get_cuda_paths_info("nvvm", error_messages) + else: + dll_name = _find_dll_using_cudalib_dir(name, error_messages, attachments) + if dll_name is None: + attachments = "\n".join(attachments) + raise RuntimeError(f"Failure finding {name}*.dll: {', '.join(error_messages)}\n{attachments}") + return dll_name + + so_basename = f"lib{name}.so" + so_name = _find_so_using_nvidia_lib_dirs(name, so_basename, error_messages, attachments) + if so_name is None: + if name == "nvvm": + so_name = _get_cuda_paths_info("nvvm", error_messages) + else: + so_name = _find_so_using_cudalib_dir(so_basename, error_messages, attachments) + if so_name is None: + attachments = "\n".join(attachments) + raise RuntimeError(f"Failure finding {so_basename}: {', '.join(error_messages)}\n{attachments}") + return so_name diff --git a/cuda_bindings/cuda/bindings/_path_finder/findlib.py b/cuda_bindings/cuda/bindings/_path_finder/findlib.py new file mode 100644 index 000000000..4de57c905 --- /dev/null +++ b/cuda_bindings/cuda/bindings/_path_finder/findlib.py @@ -0,0 +1,69 @@ +# Forked from: +# https://github.com/numba/numba/blob/f0d24824fcd6a454827e3c108882395d00befc04/numba/misc/findlib.py + +import os +import re +import sys + + +def get_lib_dirs(): + """ + Anaconda specific + """ + if sys.platform == "win32": + # on windows, historically `DLLs` has been used for CUDA libraries, + # since approximately CUDA 9.2, `Library\bin` has been used. + dirnames = ["DLLs", os.path.join("Library", "bin")] + else: + dirnames = [ + "lib", + ] + libdirs = [os.path.join(sys.prefix, x) for x in dirnames] + return libdirs + + +DLLNAMEMAP = { + "linux": r"lib%(name)s\.so\.%(ver)s$", + "linux2": r"lib%(name)s\.so\.%(ver)s$", + "linux-static": r"lib%(name)s\.a$", + "darwin": r"lib%(name)s\.%(ver)s\.dylib$", + "win32": r"%(name)s%(ver)s\.dll$", + "win32-static": r"%(name)s\.lib$", + "bsd": r"lib%(name)s\.so\.%(ver)s$", +} + +RE_VER = r"[0-9]*([_\.][0-9]+)*" + + +def find_lib(libname, libdir=None, platform=None, static=False): + platform = platform or sys.platform + platform = "bsd" if "bsd" in platform else platform + if static: + platform = f"{platform}-static" + if platform not in DLLNAMEMAP: + # Return empty list if platform name is undefined. + # Not all platforms define their static library paths. + return [] + pat = DLLNAMEMAP[platform] % {"name": libname, "ver": RE_VER} + regex = re.compile(pat) + return find_file(regex, libdir) + + +def find_file(pat, libdir=None): + if libdir is None: + libdirs = get_lib_dirs() + elif isinstance(libdir, str): + libdirs = [ + libdir, + ] + else: + libdirs = list(libdir) + files = [] + for ldir in libdirs: + try: + entries = os.listdir(ldir) + except FileNotFoundError: + continue + candidates = [os.path.join(ldir, ent) for ent in entries if pat.match(ent)] + files.extend([c for c in candidates if os.path.isfile(c)]) + return files diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py b/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py new file mode 100644 index 000000000..692e8e0bc --- /dev/null +++ b/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py @@ -0,0 +1,92 @@ +import functools +import sys + +if sys.platform == "win32": + import ctypes.wintypes + + import pywintypes + import win32api + + # Mirrors WinBase.h (unfortunately not defined already elsewhere) + _WINBASE_LOAD_LIBRARY_SEARCH_SYSTEM32 = 0x00000800 + +else: + import ctypes + import os + + _LINUX_CDLL_MODE = os.RTLD_NOW | os.RTLD_GLOBAL + +from .find_nvidia_dynamic_library import find_nvidia_dynamic_library + + +@functools.cache +def _windows_cuDriverGetVersion() -> int: + handle = win32api.LoadLibrary("nvcuda.dll") + + kernel32 = ctypes.WinDLL("kernel32", use_last_error=True) + GetProcAddress = kernel32.GetProcAddress + GetProcAddress.argtypes = [ctypes.wintypes.HMODULE, ctypes.wintypes.LPCSTR] + GetProcAddress.restype = ctypes.c_void_p + cuDriverGetVersion = GetProcAddress(handle, b"cuDriverGetVersion") + assert cuDriverGetVersion + + FUNC_TYPE = ctypes.CFUNCTYPE(ctypes.c_int, ctypes.POINTER(ctypes.c_int)) + cuDriverGetVersion_fn = FUNC_TYPE(cuDriverGetVersion) + driver_ver = ctypes.c_int() + err = cuDriverGetVersion_fn(ctypes.byref(driver_ver)) + assert err == 0 + return driver_ver.value + + +@functools.cache +def _windows_load_with_dll_basename(name: str) -> int: + driver_ver = _windows_cuDriverGetVersion() + del driver_ver # Keeping this here because it will probably be needed in the future. + + if name == "nvJitLink": + dll_name = "nvJitLink_120_0.dll" + elif name == "nvrtc": + dll_name = "nvrtc64_120_0.dll" + elif name == "nvvm": + dll_name = "nvvm64_40_0.dll" + + try: + return win32api.LoadLibrary(dll_name) + except pywintypes.error: + pass + + return None + + +@functools.cache +def load_nvidia_dynamic_library(name: str) -> int: + # First try using the platform-specific dynamic loader search mechanisms + if sys.platform == "win32": + handle = _windows_load_with_dll_basename(name) + if handle: + return handle + else: + dl_path = f"lib{name}.so" # Version intentionally no specified. + try: + handle = ctypes.CDLL(dl_path, _LINUX_CDLL_MODE) + except OSError: + pass + else: + # Use `cdef void* ptr = ` in cython to convert back to void* + return handle._handle # C unsigned int + + dl_path = find_nvidia_dynamic_library(name) + if sys.platform == "win32": + try: + handle = win32api.LoadLibrary(dl_path) + except pywintypes.error as e: + raise RuntimeError(f"Failed to load DLL at {dl_path}: {e}") from e + # Use `cdef void* ptr = ` in cython to convert back to void* + return handle # C signed int, matches win32api.GetProcAddress + else: + try: + handle = ctypes.CDLL(dl_path, _LINUX_CDLL_MODE) + except OSError as e: + raise RuntimeError(f"Failed to dlopen {dl_path}: {e}") from e + # Use `cdef void* ptr = ` in cython to convert back to void* + return handle._handle # C unsigned int diff --git a/cuda_bindings/cuda/bindings/_path_finder/sys_path_find_sub_dirs.py b/cuda_bindings/cuda/bindings/_path_finder/sys_path_find_sub_dirs.py new file mode 100644 index 000000000..d2da726c9 --- /dev/null +++ b/cuda_bindings/cuda/bindings/_path_finder/sys_path_find_sub_dirs.py @@ -0,0 +1,40 @@ +# Copyright 2024-2025 NVIDIA Corporation. All rights reserved. +# +# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE + +import functools +import os +import sys + + +@functools.cache +def _impl(sys_path, sub_dirs): + results = [] + for base in sys_path: + stack = [(base, 0)] # (current_path, index into sub_dirs) + while stack: + current_path, idx = stack.pop() + if idx == len(sub_dirs): + if os.path.isdir(current_path): + results.append(current_path) + continue + + sub = sub_dirs[idx] + if sub == "*": + try: + entries = sorted(os.listdir(current_path)) + except OSError: + continue + for entry in entries: + entry_path = os.path.join(current_path, entry) + if os.path.isdir(entry_path): + stack.append((entry_path, idx + 1)) + else: + next_path = os.path.join(current_path, sub) + if os.path.isdir(next_path): + stack.append((next_path, idx + 1)) + return results + + +def sys_path_find_sub_dirs(sub_dirs): + return _impl(tuple(sys.path), tuple(sub_dirs)) diff --git a/cuda_bindings/cuda/bindings/path_finder.py b/cuda_bindings/cuda/bindings/path_finder.py new file mode 100644 index 000000000..21aeb4b36 --- /dev/null +++ b/cuda_bindings/cuda/bindings/path_finder.py @@ -0,0 +1,37 @@ +# Copyright 2024-2025 NVIDIA Corporation. All rights reserved. +# +# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE + +from cuda.bindings._path_finder.cuda_paths import ( + get_conda_ctk, + get_conda_include_dir, + get_cuda_home, + get_cuda_paths, + get_current_cuda_target_name, + get_debian_pkg_libdevice, + get_libdevice_wheel, + get_nvidia_cudalib_ctk, + get_nvidia_libdevice_ctk, + get_nvidia_nvvm_ctk, + get_nvidia_static_cudalib_ctk, + get_system_ctk, +) +from cuda.bindings._path_finder.find_nvidia_dynamic_library import find_nvidia_dynamic_library +from cuda.bindings._path_finder.load_nvidia_dynamic_library import load_nvidia_dynamic_library + +__all__ = [ + "find_nvidia_dynamic_library", + "load_nvidia_dynamic_library", + "get_conda_ctk", + "get_conda_include_dir", + "get_cuda_home", + "get_cuda_paths", + "get_current_cuda_target_name", + "get_debian_pkg_libdevice", + "get_libdevice_wheel", + "get_nvidia_cudalib_ctk", + "get_nvidia_libdevice_ctk", + "get_nvidia_nvvm_ctk", + "get_nvidia_static_cudalib_ctk", + "get_system_ctk", +] diff --git a/cuda_bindings/tests/path_finder.py b/cuda_bindings/tests/path_finder.py new file mode 100644 index 000000000..e9245a5be --- /dev/null +++ b/cuda_bindings/tests/path_finder.py @@ -0,0 +1,9 @@ +from cuda.bindings import path_finder + +paths = path_finder.get_cuda_paths() + +for k, v in paths.items(): + print(f"{k}: {v}", flush=True) + +print(path_finder.find_nvidia_dynamic_library("nvvm")) +print(path_finder.find_nvidia_dynamic_library("nvJitLink")) diff --git a/cuda_bindings/tests/test_sys_path_find_sub_dirs.py b/cuda_bindings/tests/test_sys_path_find_sub_dirs.py new file mode 100644 index 000000000..3297ce39e --- /dev/null +++ b/cuda_bindings/tests/test_sys_path_find_sub_dirs.py @@ -0,0 +1,72 @@ +import os + +import pytest + +from cuda.bindings._path_finder.sys_path_find_sub_dirs import _impl + + +@pytest.fixture +def test_tree(tmp_path): + # Build: + # tmp_path/ + # sys1/nvidia/foo/lib + # sys1/nvidia/bar/lib + # sys2/nvidia/baz/nvvm/lib64 + base = tmp_path + (base / "sys1" / "nvidia" / "foo" / "lib").mkdir(parents=True) + (base / "sys1" / "nvidia" / "bar" / "lib").mkdir(parents=True) + (base / "sys2" / "nvidia" / "baz" / "nvvm" / "lib64").mkdir(parents=True) + + return { + "sys_path": ( + str(base / "sys1"), + str(base / "sys2"), + str(base / "nonexistent"), # should be ignored + ), + "base": base, + } + + +def test_exact_match(test_tree): + sys_path = test_tree["sys_path"] + base = test_tree["base"] + result = _impl(sys_path, ("nvidia", "foo", "lib")) + expected = [str(base / "sys1" / "nvidia" / "foo" / "lib")] + assert result == expected + + +def test_single_wildcard(test_tree): + sys_path = test_tree["sys_path"] + base = test_tree["base"] + result = _impl(sys_path, ("nvidia", "*", "lib")) + expected = [ + str(base / "sys1" / "nvidia" / "bar" / "lib"), + str(base / "sys1" / "nvidia" / "foo" / "lib"), + ] + assert sorted(result) == sorted(expected) + + +def test_double_wildcard(test_tree): + sys_path = test_tree["sys_path"] + base = test_tree["base"] + result = _impl(sys_path, ("nvidia", "*", "nvvm", "lib64")) + expected = [str(base / "sys2" / "nvidia" / "baz" / "nvvm" / "lib64")] + assert result == expected + + +def test_no_match(test_tree): + sys_path = test_tree["sys_path"] + result = _impl(sys_path, ("nvidia", "nonexistent", "lib")) + assert result == [] + + +def test_empty_sys_path(): + result = _impl((), ("nvidia", "*", "lib")) + assert result == [] + + +def test_empty_sub_dirs(test_tree): + sys_path = test_tree["sys_path"] + result = _impl(sys_path, ()) + expected = [p for p in sys_path if os.path.isdir(p)] + assert sorted(result) == sorted(expected) From 7a0c06870b6260af92f90691f28279cbd40e43eb Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Thu, 10 Apr 2025 16:15:58 -0700 Subject: [PATCH 02/52] Make `path_finder` work for `"nvrtc"` (#553) * Revert "Restore cuda/bindings/_bindings/cynvrtc.pyx.in as-is on main" This reverts commit ba093f5700a99153b5c26b224a21aaceb69ae72b. * Revert "Reapply "Revert debug changes under .github/workflows"" This reverts commit 8f69f832af51c393601b09c2fe29d874e9abb057. * Also load nvrtc from cuda_bindings/tests/path_finder.py * Add heuristics for nvidia_cuda_nvrtc Windows wheels. Also fix a couple bugs discovered by ChatGPT: * `glob.glob()` in this code return absolute paths. * stray `error_messages = []` * Add debug prints, mostly for `os.add_dll_directory(bin_dir)` * Fix unfortunate silly oversight (import os missing under Windows) * Use `win32api.LoadLibraryEx()` with suitable `flags`; also update `os.environ["PATH"]` * Hard-wire WinBase.h constants (they are not exposed by win32con) * Remove debug prints * Reapply "Reapply "Revert debug changes under .github/workflows"" This reverts commit b002ff676c681c18f82fb9ebda875ddfec668fc9. --- .../cuda/bindings/_bindings/cynvrtc.pyx.in | 63 +++---------------- .../find_nvidia_dynamic_library.py | 45 +++++++++---- .../load_nvidia_dynamic_library.py | 6 +- cuda_bindings/tests/path_finder.py | 13 +++- 4 files changed, 58 insertions(+), 69 deletions(-) diff --git a/cuda_bindings/cuda/bindings/_bindings/cynvrtc.pyx.in b/cuda_bindings/cuda/bindings/_bindings/cynvrtc.pyx.in index a0f8a27a0..2b0f3dc23 100644 --- a/cuda_bindings/cuda/bindings/_bindings/cynvrtc.pyx.in +++ b/cuda_bindings/cuda/bindings/_bindings/cynvrtc.pyx.in @@ -9,13 +9,12 @@ # This code was automatically generated with version 12.8.0. Do not modify it directly. {{if 'Windows' == platform.system()}} import os -import site -import struct import win32api -from pywintypes import error {{else}} cimport cuda.bindings._lib.dlfcn as dlfcn +from libc.stdint cimport uintptr_t {{endif}} +from cuda.bindings import path_finder cdef bint __cuPythonInit = False {{if 'nvrtcGetErrorString' in found_functions}}cdef void *__nvrtcGetErrorString = NULL{{endif}} @@ -46,64 +45,18 @@ cdef bint __cuPythonInit = False {{if 'nvrtcSetFlowCallback' in found_functions}}cdef void *__nvrtcSetFlowCallback = NULL{{endif}} cdef int cuPythonInit() except -1 nogil: + {{if 'Windows' != platform.system()}} + cdef void* handle = NULL + {{endif}} + global __cuPythonInit if __cuPythonInit: return 0 __cuPythonInit = True - # Load library - {{if 'Windows' == platform.system()}} - with gil: - # First check if the DLL has been loaded by 3rd parties - try: - handle = win32api.GetModuleHandle("nvrtc64_120_0.dll") - except: - handle = None - - # Else try default search - if not handle: - LOAD_LIBRARY_SAFE_CURRENT_DIRS = 0x00002000 - try: - handle = win32api.LoadLibraryEx("nvrtc64_120_0.dll", 0, LOAD_LIBRARY_SAFE_CURRENT_DIRS) - except: - pass - - # Final check if DLLs can be found within pip installations - if not handle: - site_packages = [site.getusersitepackages()] + site.getsitepackages() - for sp in site_packages: - mod_path = os.path.join(sp, "nvidia", "cuda_nvrtc", "bin") - if not os.path.isdir(mod_path): - continue - os.add_dll_directory(mod_path) - LOAD_LIBRARY_SEARCH_DEFAULT_DIRS = 0x00001000 - LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR = 0x00000100 - try: - handle = win32api.LoadLibraryEx( - # Note: LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR needs an abs path... - os.path.join(mod_path, "nvrtc64_120_0.dll"), - 0, LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR) - - # Note: nvrtc64_120_0.dll calls into nvrtc-builtins64_*.dll which is - # located in the same mod_path. - # Update PATH environ so that the two dlls can find each other - os.environ["PATH"] = os.pathsep.join((os.environ.get("PATH", ""), mod_path)) - except: - pass - - if not handle: - raise RuntimeError('Failed to LoadLibraryEx nvrtc64_120_0.dll') - {{else}} - handle = dlfcn.dlopen('libnvrtc.so.12', dlfcn.RTLD_NOW) - if handle == NULL: - with gil: - raise RuntimeError('Failed to dlopen libnvrtc.so.12') - {{endif}} - - - # Load function {{if 'Windows' == platform.system()}} with gil: + handle = path_finder.load_nvidia_dynamic_library("nvrtc") {{if 'nvrtcGetErrorString' in found_functions}} try: global __nvrtcGetErrorString @@ -288,6 +241,8 @@ cdef int cuPythonInit() except -1 nogil: {{endif}} {{else}} + with gil: + handle = path_finder.load_nvidia_dynamic_library("nvrtc") {{if 'nvrtcGetErrorString' in found_functions}} global __nvrtcGetErrorString __nvrtcGetErrorString = dlfcn.dlsym(handle, 'nvrtcGetErrorString') diff --git a/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py b/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py index 30a9b68f4..3d6604f08 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py +++ b/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py @@ -31,14 +31,18 @@ def _find_so_using_nvidia_lib_dirs(libname, so_basename, error_messages, attachm return so_name # Look for a versioned library # Using sort here mainly to make the result deterministic. - for node in sorted(glob.glob(os.path.join(lib_dir, file_wild))): - so_name = os.path.join(lib_dir, node) + for so_name in sorted(glob.glob(os.path.join(lib_dir, file_wild))): if os.path.isfile(so_name): return so_name _no_such_file_in_sub_dirs(nvidia_sub_dirs, file_wild, error_messages, attachments) return None +def _append_to_os_environ_path(dirpath): + curr_path = os.environ.get("PATH") + os.environ["PATH"] = dirpath if curr_path is None else os.pathsep.join((curr_path, dirpath)) + + def _find_dll_using_nvidia_bin_dirs(libname, error_messages, attachments): if libname == "nvvm": # noqa: SIM108 nvidia_sub_dirs = ("nvidia", "*", "nvvm", "bin") @@ -46,10 +50,31 @@ def _find_dll_using_nvidia_bin_dirs(libname, error_messages, attachments): nvidia_sub_dirs = ("nvidia", "*", "bin") file_wild = libname + "*.dll" for bin_dir in sys_path_find_sub_dirs(nvidia_sub_dirs): - for node in sorted(glob.glob(os.path.join(bin_dir, file_wild))): - dll_name = os.path.join(bin_dir, node) - if os.path.isfile(dll_name): - return dll_name + dll_name = None + have_builtins = False + for path in sorted(glob.glob(os.path.join(bin_dir, file_wild))): + # nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-win_amd64.whl: + # nvidia\cuda_nvrtc\bin\ + # nvrtc-builtins64_128.dll + # nvrtc64_120_0.alt.dll + # nvrtc64_120_0.dll + node = os.path.basename(path) + if node.endswith(".alt.dll"): + continue + if "-builtins" in node: + have_builtins = True + continue + if dll_name is not None: + continue + if os.path.isfile(path): + dll_name = path + if dll_name is not None: + if have_builtins: + # Add the DLL directory to the search path + os.add_dll_directory(bin_dir) + # Update PATH as a fallback for dependent DLL resolution + _append_to_os_environ_path(bin_dir) + return dll_name _no_such_file_in_sub_dirs(nvidia_sub_dirs, file_wild, error_messages, attachments) return None @@ -78,7 +103,6 @@ def _find_so_using_cudalib_dir(so_basename, error_messages, attachments): candidate_so_dirs.append(alt_dir) libs.reverse() candidate_so_names = [so_dirname + so_basename for so_dirname in candidate_so_dirs] - error_messages = [] for so_name in candidate_so_names: if os.path.isfile(so_name): return so_name @@ -98,8 +122,7 @@ def _find_dll_using_cudalib_dir(libname, error_messages, attachments): if cudalib_dir is None: return None file_wild = libname + "*.dll" - for node in sorted(glob.glob(os.path.join(cudalib_dir, file_wild))): - dll_name = os.path.join(cudalib_dir, node) + for dll_name in sorted(glob.glob(os.path.join(cudalib_dir, file_wild))): if os.path.isfile(dll_name): return dll_name error_messages.append(f"No such file: {file_wild}") @@ -123,7 +146,7 @@ def find_nvidia_dynamic_library(name: str) -> str: dll_name = _find_dll_using_cudalib_dir(name, error_messages, attachments) if dll_name is None: attachments = "\n".join(attachments) - raise RuntimeError(f"Failure finding {name}*.dll: {', '.join(error_messages)}\n{attachments}") + raise RuntimeError(f'Failure finding "{name}*.dll": {", ".join(error_messages)}\n{attachments}') return dll_name so_basename = f"lib{name}.so" @@ -135,5 +158,5 @@ def find_nvidia_dynamic_library(name: str) -> str: so_name = _find_so_using_cudalib_dir(so_basename, error_messages, attachments) if so_name is None: attachments = "\n".join(attachments) - raise RuntimeError(f"Failure finding {so_basename}: {', '.join(error_messages)}\n{attachments}") + raise RuntimeError(f'Failure finding "{so_basename}": {", ".join(error_messages)}\n{attachments}') return so_name diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py b/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py index 692e8e0bc..1a52bf0dd 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py +++ b/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py @@ -8,7 +8,8 @@ import win32api # Mirrors WinBase.h (unfortunately not defined already elsewhere) - _WINBASE_LOAD_LIBRARY_SEARCH_SYSTEM32 = 0x00000800 + _WINBASE_LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR = 0x00000100 + _WINBASE_LOAD_LIBRARY_SEARCH_DEFAULT_DIRS = 0x00001000 else: import ctypes @@ -77,8 +78,9 @@ def load_nvidia_dynamic_library(name: str) -> int: dl_path = find_nvidia_dynamic_library(name) if sys.platform == "win32": + flags = _WINBASE_LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | _WINBASE_LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR try: - handle = win32api.LoadLibrary(dl_path) + handle = win32api.LoadLibraryEx(dl_path, 0, flags) except pywintypes.error as e: raise RuntimeError(f"Failed to load DLL at {dl_path}: {e}") from e # Use `cdef void* ptr = ` in cython to convert back to void* diff --git a/cuda_bindings/tests/path_finder.py b/cuda_bindings/tests/path_finder.py index e9245a5be..9b7dd23a3 100644 --- a/cuda_bindings/tests/path_finder.py +++ b/cuda_bindings/tests/path_finder.py @@ -4,6 +4,15 @@ for k, v in paths.items(): print(f"{k}: {v}", flush=True) +print() -print(path_finder.find_nvidia_dynamic_library("nvvm")) -print(path_finder.find_nvidia_dynamic_library("nvJitLink")) +libnames = ("nvJitLink", "nvrtc", "nvvm") + +for libname in libnames: + print(path_finder.find_nvidia_dynamic_library(libname)) + print() + +for libname in libnames: + print(libname) + print(path_finder.load_nvidia_dynamic_library(libname)) + print() From 74c975009c0ed8d11bd9ab6bc900164d60a4f0a4 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Fri, 25 Apr 2025 08:10:02 -0700 Subject: [PATCH 03/52] Add `path_finder.SUPPORTED_LIBNAMES` (#558) * Revert "Reapply "Revert debug changes under .github/workflows"" This reverts commit 8f69f832af51c393601b09c2fe29d874e9abb057. * Add names of all CTK 12.8.1 x86_64-linux libraries (.so) as `path_finder.SUPPORTED_LIBNAMES` https://chatgpt.com/share/67f98d0b-148c-8008-9951-9995cf5d860c * Add `SUPPORTED_WINDOWS_DLLS` * Add copyright notice * Move SUPPORTED_LIBNAMES, SUPPORTED_WINDOWS_DLLS to _path_finder/supported_libs.py * Use SUPPORTED_WINDOWS_DLLS in _windows_load_with_dll_basename() * Change "Set up mini CTK" to use `method: local`, remove `sub-packages` line. * Use Jimver/cuda-toolkit@v0.2.21 also under Linux, `method: local`, no `sub-packages`. * Add more `nvidia-*-cu12` wheels to get as many of the supported shared libraries as possible. * Revert "Use Jimver/cuda-toolkit@v0.2.21 also under Linux, `method: local`, no `sub-packages`." This reverts commit d49980665ac484626cd0ad9e7f727d5761f34da5. Problem observed: ``` /usr/bin/docker exec 1b42cd4ea3149ac3f2448eae830190ee62289b7304a73f8001e90cead5005102 sh -c "cat /etc/*release | grep ^ID" Warning: Failed to restore: Cache service responded with 422 /usr/bin/tar --posix -cf cache.tgz --exclude cache.tgz -P -C /__w/cuda-python/cuda-python --files-from manifest.txt -z Failed to save: Unable to reserve cache with key cuda_installer-linux-5.15.0-135-generic-x64-12.8.0, another job may be creating this cache. More details: This legacy service is shutting down, effective April 15, 2025. Migrate to the new service ASAP. For more information: https://gh.io/gha-cache-sunset Warning: Error during installation: Error: Unable to locate executable file: sudo. Please verify either the file path exists or the file can be found within a directory specified by the PATH environment variable. Also check the file mode to verify the file is executable. Error: Error: Unable to locate executable file: sudo. Please verify either the file path exists or the file can be found within a directory specified by the PATH environment variable. Also check the file mode to verify the file is executable. ``` * Change test_path_finder::test_find_and_load() to skip cufile on Windows, and report exceptions as failures, except for cudart * Add nvidia-cuda-runtime-cu12 to pyproject.toml (for libname cudart) * test_path_finder.py: before loading cusolver, load nvJitLink, cusparse, cublas (experiment to see if that resolves the only Windows failure) Test (win-64, Python 3.12, CUDA 12.8.0, Runner default, CTK wheels) / test ``` ================================== FAILURES =================================== ________________________ test_find_and_load[cusolver] _________________________ libname = 'cusolver' @pytest.mark.parametrize("libname", path_finder.SUPPORTED_LIBNAMES) def test_find_and_load(libname): if sys.platform == "win32" and libname == "cufile": pytest.skip(f'test_find_and_load("{libname}") not supported on this platform') print(f'\ntest_find_and_load("{libname}")') failures = [] for algo, func in ( ("find", path_finder.find_nvidia_dynamic_library), ("load", path_finder.load_nvidia_dynamic_library), ): try: out = func(libname) except Exception as e: out = f"EXCEPTION: {type(e)} {str(e)}" failures.append(algo) print(out) print() > assert not failures E AssertionError: assert not ['load'] tests\test_path_finder.py:29: AssertionError ``` * test_path_finder.py: load *only* nvJitLink before loading cusolver * Run each test_find_or_load_nvidia_dynamic_library() subtest in a subprocess * Add cublasLt to supported_libs.py and load deps for cusolver, cusolverMg, cusparse in test_path_finder.py. Also restrict test_path_finder.py to test load only for now. * Add supported_libs.DIRECT_DEPENDENCIES * Remove cufile_rdma from supported libs (comment out). https://chatgpt.com/share/68033a33-385c-8008-a293-4c8cc3ea23ae * Split out `PARTIALLY_SUPPORTED_LIBNAMES`. Fix up test code. * Reduce public API to only load_nvidia_dynamic_library, SUPPORTED_LIBNAMES * Set CUDA_BINDINGS_PATH_FINDER_TEST_ALL_LIBNAMES=1 to match expected availability of nvidia shared libraries. * Refactor as `class _find_nvidia_dynamic_library` * Strict wheel, conda, system rule: try using the platform-specific dynamic loader search mechanisms only last * Introduce _load_and_report_path_linux(), add supported_libs.EXPECTED_LIB_SYMBOLS * Plug in ctypes.windll.kernel32.GetModuleFileNameW() * Keep track of nvrtc-related GitHub comment * Factor out `_find_dll_under_dir(dirpath, file_wild)` and reuse from `_find_dll_using_nvidia_bin_dirs()`, `_find_dll_using_cudalib_dir()` (to fix loading nvrtc64_120_0.dll from local CTK) * Minimal "is already loaded" code. * Add THIS FILE NEEDS TO BE REVIEWED/UPDATED FOR EACH CTK RELEASE comment in _path_finder/supported_libs.py * Add SUPPORTED_LINUX_SONAMES in _path_finder/supported_libs.py * Update SUPPORTED_WINDOWS_DLLS in _path_finder/supported_libs.py based on DLLs found in cuda_*win*.exe files. * Remove `os.add_dll_directory()` and `os.environ["PATH"]` manipulations from find_nvidia_dynamic_library.py. Add `supported_libs.LIBNAMES_REQUIRING_OS_ADD_DLL_DIRECTORY` and use from `load_nvidia_dynamic_library()`. * Move nvrtc-specific code from find_nvidia_dynamic_library.py to `supported_libs.is_suppressed_dll_file()` * Introduce dataclass LoadedDL as return type for load_nvidia_dynamic_library() * Factor out _abs_path_for_dynamic_library_* and use on handle obtained through "is already loaded" checks * Factor out _load_nvidia_dynamic_library_no_cache() and use for exercising LoadedDL.was_already_loaded_from_elsewhere * _check_nvjitlink_usable() in test_path_finder.py * Undo changes in .github/workflows/ and cuda_bindings/pyproject.toml * Move cuda_bindings/tests/path_finder.py -> toolshed/run_cuda_bindings_path_finder.py * Add bandit suppressions in test_path_finder.py * Add pytest info_summary_append fixture and use from test_path_finder.py to report the absolute paths of the loaded libraries. --- .../cuda/bindings/_bindings/cynvrtc.pyx.in | 4 +- .../bindings/_internal/nvjitlink_linux.pyx | 2 +- .../bindings/_internal/nvjitlink_windows.pyx | 2 +- .../cuda/bindings/_internal/nvvm_linux.pyx | 2 +- .../cuda/bindings/_internal/nvvm_windows.pyx | 2 +- .../find_nvidia_dynamic_library.py | 106 +++-- .../load_nvidia_dynamic_library.py | 173 +++++++-- .../bindings/_path_finder/supported_libs.py | 364 ++++++++++++++++++ cuda_bindings/cuda/bindings/path_finder.py | 30 +- cuda_bindings/pyproject.toml | 1 - cuda_bindings/tests/conftest.py | 20 + cuda_bindings/tests/path_finder.py | 18 - cuda_bindings/tests/test_path_finder.py | 92 +++++ toolshed/build_path_finder_dlls.py | 84 ++++ toolshed/build_path_finder_sonames.py | 74 ++++ toolshed/find_sonames.sh | 6 + toolshed/run_cuda_bindings_path_finder.py | 34 ++ 17 files changed, 868 insertions(+), 146 deletions(-) create mode 100644 cuda_bindings/cuda/bindings/_path_finder/supported_libs.py create mode 100644 cuda_bindings/tests/conftest.py delete mode 100644 cuda_bindings/tests/path_finder.py create mode 100644 cuda_bindings/tests/test_path_finder.py create mode 100755 toolshed/build_path_finder_dlls.py create mode 100755 toolshed/build_path_finder_sonames.py create mode 100755 toolshed/find_sonames.sh create mode 100644 toolshed/run_cuda_bindings_path_finder.py diff --git a/cuda_bindings/cuda/bindings/_bindings/cynvrtc.pyx.in b/cuda_bindings/cuda/bindings/_bindings/cynvrtc.pyx.in index 2b0f3dc23..d2bb0b63b 100644 --- a/cuda_bindings/cuda/bindings/_bindings/cynvrtc.pyx.in +++ b/cuda_bindings/cuda/bindings/_bindings/cynvrtc.pyx.in @@ -56,7 +56,7 @@ cdef int cuPythonInit() except -1 nogil: {{if 'Windows' == platform.system()}} with gil: - handle = path_finder.load_nvidia_dynamic_library("nvrtc") + handle = path_finder.load_nvidia_dynamic_library("nvrtc").handle {{if 'nvrtcGetErrorString' in found_functions}} try: global __nvrtcGetErrorString @@ -242,7 +242,7 @@ cdef int cuPythonInit() except -1 nogil: {{else}} with gil: - handle = path_finder.load_nvidia_dynamic_library("nvrtc") + handle = path_finder.load_nvidia_dynamic_library("nvrtc").handle {{if 'nvrtcGetErrorString' in found_functions}} global __nvrtcGetErrorString __nvrtcGetErrorString = dlfcn.dlsym(handle, 'nvrtcGetErrorString') diff --git a/cuda_bindings/cuda/bindings/_internal/nvjitlink_linux.pyx b/cuda_bindings/cuda/bindings/_internal/nvjitlink_linux.pyx index 9d21a3e10..78b4d802b 100644 --- a/cuda_bindings/cuda/bindings/_internal/nvjitlink_linux.pyx +++ b/cuda_bindings/cuda/bindings/_internal/nvjitlink_linux.pyx @@ -53,7 +53,7 @@ cdef void* __nvJitLinkVersion = NULL cdef void* load_library(int driver_ver) except* with gil: - cdef uintptr_t handle = path_finder.load_nvidia_dynamic_library("nvJitLink") + cdef uintptr_t handle = path_finder.load_nvidia_dynamic_library("nvJitLink").handle return handle diff --git a/cuda_bindings/cuda/bindings/_internal/nvjitlink_windows.pyx b/cuda_bindings/cuda/bindings/_internal/nvjitlink_windows.pyx index f86972216..b306a3001 100644 --- a/cuda_bindings/cuda/bindings/_internal/nvjitlink_windows.pyx +++ b/cuda_bindings/cuda/bindings/_internal/nvjitlink_windows.pyx @@ -40,7 +40,7 @@ cdef void* __nvJitLinkVersion = NULL cdef void* load_library(int driver_ver) except* with gil: - cdef intptr_t handle = path_finder.load_nvidia_dynamic_library("nvJitLink") + cdef intptr_t handle = path_finder.load_nvidia_dynamic_library("nvJitLink").handle return handle diff --git a/cuda_bindings/cuda/bindings/_internal/nvvm_linux.pyx b/cuda_bindings/cuda/bindings/_internal/nvvm_linux.pyx index 33ba8e610..82335508b 100644 --- a/cuda_bindings/cuda/bindings/_internal/nvvm_linux.pyx +++ b/cuda_bindings/cuda/bindings/_internal/nvvm_linux.pyx @@ -51,7 +51,7 @@ cdef void* __nvvmGetProgramLog = NULL cdef void* load_library(const int driver_ver) except* with gil: - cdef uintptr_t handle = path_finder.load_nvidia_dynamic_library("nvvm") + cdef uintptr_t handle = path_finder.load_nvidia_dynamic_library("nvvm").handle return handle diff --git a/cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx b/cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx index 6349fa5a1..21b4d9418 100644 --- a/cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx +++ b/cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx @@ -38,7 +38,7 @@ cdef void* __nvvmGetProgramLog = NULL cdef void* load_library(int driver_ver) except* with gil: - cdef intptr_t handle = path_finder.load_nvidia_dynamic_library("nvvm") + cdef intptr_t handle = path_finder.load_nvidia_dynamic_library("nvvm").handle return handle diff --git a/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py b/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py index 3d6604f08..e60154aa5 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py +++ b/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py @@ -7,6 +7,7 @@ import os from .cuda_paths import IS_WIN32, get_cuda_paths +from .supported_libs import is_suppressed_dll_file from .sys_path_find_sub_dirs import sys_path_find_sub_dirs @@ -38,9 +39,13 @@ def _find_so_using_nvidia_lib_dirs(libname, so_basename, error_messages, attachm return None -def _append_to_os_environ_path(dirpath): - curr_path = os.environ.get("PATH") - os.environ["PATH"] = dirpath if curr_path is None else os.pathsep.join((curr_path, dirpath)) +def _find_dll_under_dir(dirpath, file_wild): + for path in sorted(glob.glob(os.path.join(dirpath, file_wild))): + if not os.path.isfile(path): + continue + if not is_suppressed_dll_file(os.path.basename(path)): + return path + return None def _find_dll_using_nvidia_bin_dirs(libname, error_messages, attachments): @@ -50,30 +55,8 @@ def _find_dll_using_nvidia_bin_dirs(libname, error_messages, attachments): nvidia_sub_dirs = ("nvidia", "*", "bin") file_wild = libname + "*.dll" for bin_dir in sys_path_find_sub_dirs(nvidia_sub_dirs): - dll_name = None - have_builtins = False - for path in sorted(glob.glob(os.path.join(bin_dir, file_wild))): - # nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-win_amd64.whl: - # nvidia\cuda_nvrtc\bin\ - # nvrtc-builtins64_128.dll - # nvrtc64_120_0.alt.dll - # nvrtc64_120_0.dll - node = os.path.basename(path) - if node.endswith(".alt.dll"): - continue - if "-builtins" in node: - have_builtins = True - continue - if dll_name is not None: - continue - if os.path.isfile(path): - dll_name = path + dll_name = _find_dll_under_dir(bin_dir, file_wild) if dll_name is not None: - if have_builtins: - # Add the DLL directory to the search path - os.add_dll_directory(bin_dir) - # Update PATH as a fallback for dependent DLL resolution - _append_to_os_environ_path(bin_dir) return dll_name _no_such_file_in_sub_dirs(nvidia_sub_dirs, file_wild, error_messages, attachments) return None @@ -122,9 +105,9 @@ def _find_dll_using_cudalib_dir(libname, error_messages, attachments): if cudalib_dir is None: return None file_wild = libname + "*.dll" - for dll_name in sorted(glob.glob(os.path.join(cudalib_dir, file_wild))): - if os.path.isfile(dll_name): - return dll_name + dll_name = _find_dll_under_dir(cudalib_dir, file_wild) + if dll_name is not None: + return dll_name error_messages.append(f"No such file: {file_wild}") attachments.append(f' listdir("{cudalib_dir}"):') for node in sorted(os.listdir(cudalib_dir)): @@ -132,31 +115,42 @@ def _find_dll_using_cudalib_dir(libname, error_messages, attachments): return None -@functools.cache -def find_nvidia_dynamic_library(name: str) -> str: - error_messages = [] - attachments = [] - - if IS_WIN32: - dll_name = _find_dll_using_nvidia_bin_dirs(name, error_messages, attachments) - if dll_name is None: - if name == "nvvm": - dll_name = _get_cuda_paths_info("nvvm", error_messages) - else: - dll_name = _find_dll_using_cudalib_dir(name, error_messages, attachments) - if dll_name is None: - attachments = "\n".join(attachments) - raise RuntimeError(f'Failure finding "{name}*.dll": {", ".join(error_messages)}\n{attachments}') - return dll_name - - so_basename = f"lib{name}.so" - so_name = _find_so_using_nvidia_lib_dirs(name, so_basename, error_messages, attachments) - if so_name is None: - if name == "nvvm": - so_name = _get_cuda_paths_info("nvvm", error_messages) +class _find_nvidia_dynamic_library: + def __init__(self, libname: str): + self.libname = libname + self.error_messages = [] + self.attachments = [] + self.abs_path = None + + if IS_WIN32: + self.abs_path = _find_dll_using_nvidia_bin_dirs(libname, self.error_messages, self.attachments) + if self.abs_path is None: + if libname == "nvvm": + self.abs_path = _get_cuda_paths_info("nvvm", self.error_messages) + else: + self.abs_path = _find_dll_using_cudalib_dir(libname, self.error_messages, self.attachments) + self.lib_searched_for = f"{libname}*.dll" else: - so_name = _find_so_using_cudalib_dir(so_basename, error_messages, attachments) - if so_name is None: - attachments = "\n".join(attachments) - raise RuntimeError(f'Failure finding "{so_basename}": {", ".join(error_messages)}\n{attachments}') - return so_name + self.lib_searched_for = f"lib{libname}.so" + self.abs_path = _find_so_using_nvidia_lib_dirs( + libname, self.lib_searched_for, self.error_messages, self.attachments + ) + if self.abs_path is None: + if libname == "nvvm": + self.abs_path = _get_cuda_paths_info("nvvm", self.error_messages) + else: + self.abs_path = _find_so_using_cudalib_dir( + self.lib_searched_for, self.error_messages, self.attachments + ) + + def raise_if_abs_path_is_None(self): + if self.abs_path: + return self.abs_path + err = ", ".join(self.error_messages) + att = "\n".join(self.attachments) + raise RuntimeError(f'Failure finding "{self.lib_searched_for}": {err}\n{att}') + + +@functools.cache +def find_nvidia_dynamic_library(libname: str) -> str: + return _find_nvidia_dynamic_library(libname).raise_if_abs_path_is_None() diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py b/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py index 1a52bf0dd..c770de67d 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py +++ b/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py @@ -1,5 +1,13 @@ +# Copyright 2025 NVIDIA Corporation. All rights reserved. +# +# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE + +import ctypes import functools +import os import sys +from dataclasses import dataclass +from typing import Optional, Tuple if sys.platform == "win32": import ctypes.wintypes @@ -12,12 +20,42 @@ _WINBASE_LOAD_LIBRARY_SEARCH_DEFAULT_DIRS = 0x00001000 else: - import ctypes - import os + import ctypes.util _LINUX_CDLL_MODE = os.RTLD_NOW | os.RTLD_GLOBAL -from .find_nvidia_dynamic_library import find_nvidia_dynamic_library + _LIBDL_PATH = ctypes.util.find_library("dl") or "libdl.so.2" + _LIBDL = ctypes.CDLL(_LIBDL_PATH) + _LIBDL.dladdr.argtypes = [ctypes.c_void_p, ctypes.c_void_p] + _LIBDL.dladdr.restype = ctypes.c_int + + class Dl_info(ctypes.Structure): + _fields_ = [ + ("dli_fname", ctypes.c_char_p), # path to .so + ("dli_fbase", ctypes.c_void_p), + ("dli_sname", ctypes.c_char_p), + ("dli_saddr", ctypes.c_void_p), + ] + + +from .find_nvidia_dynamic_library import _find_nvidia_dynamic_library +from .supported_libs import ( + DIRECT_DEPENDENCIES, + EXPECTED_LIB_SYMBOLS, + LIBNAMES_REQUIRING_OS_ADD_DLL_DIRECTORY, + SUPPORTED_LINUX_SONAMES, + SUPPORTED_WINDOWS_DLLS, +) + + +def _add_dll_directory(dll_abs_path): + dirpath = os.path.dirname(dll_abs_path) + assert os.path.isdir(dirpath), dll_abs_path + # Add the DLL directory to the search path + os.add_dll_directory(dirpath) + # Update PATH as a fallback for dependent DLL resolution + curr_path = os.environ.get("PATH") + os.environ["PATH"] = dirpath if curr_path is None else os.pathsep.join((curr_path, dirpath)) @functools.cache @@ -39,56 +77,117 @@ def _windows_cuDriverGetVersion() -> int: return driver_ver.value +def _abs_path_for_dynamic_library_windows(handle: int) -> str: + buf = ctypes.create_unicode_buffer(260) + n_chars = ctypes.windll.kernel32.GetModuleFileNameW(ctypes.wintypes.HMODULE(handle), buf, len(buf)) + if n_chars == 0: + raise OSError("GetModuleFileNameW failed") + return buf.value + + @functools.cache -def _windows_load_with_dll_basename(name: str) -> int: +def _windows_load_with_dll_basename(name: str) -> Tuple[Optional[int], Optional[str]]: driver_ver = _windows_cuDriverGetVersion() del driver_ver # Keeping this here because it will probably be needed in the future. - if name == "nvJitLink": - dll_name = "nvJitLink_120_0.dll" - elif name == "nvrtc": - dll_name = "nvrtc64_120_0.dll" - elif name == "nvvm": - dll_name = "nvvm64_40_0.dll" + dll_names = SUPPORTED_WINDOWS_DLLS.get(name) + if dll_names is None: + return None - try: - return win32api.LoadLibrary(dll_name) - except pywintypes.error: - pass + for dll_name in dll_names: + handle = ctypes.windll.kernel32.LoadLibraryW(ctypes.c_wchar_p(dll_name)) + if handle: + return handle, _abs_path_for_dynamic_library_windows(handle) - return None + return None, None -@functools.cache -def load_nvidia_dynamic_library(name: str) -> int: - # First try using the platform-specific dynamic loader search mechanisms +def _abs_path_for_dynamic_library_linux(libname: str, handle: int) -> str: + for symbol_name in EXPECTED_LIB_SYMBOLS[libname]: + symbol = getattr(handle, symbol_name, None) + if symbol is not None: + break + else: + return None + addr = ctypes.cast(symbol, ctypes.c_void_p) + info = Dl_info() + if _LIBDL.dladdr(addr, ctypes.byref(info)) == 0: + raise OSError(f"dladdr failed for {libname=!r}") + return info.dli_fname.decode() + + +def _load_and_report_path_linux(libname: str, soname: str) -> Tuple[int, str]: + handle = ctypes.CDLL(soname, _LINUX_CDLL_MODE) + abs_path = _abs_path_for_dynamic_library_linux(libname, handle) + if abs_path is None: + raise RuntimeError(f"No expected symbol for {libname=!r}") + return handle, abs_path + + +@dataclass +class LoadedDL: + # ATTENTION: To convert `handle` back to `void*` in cython: + # Linux: `cdef void* ptr = ` + # Windows: `cdef void* ptr = ` + handle: int + abs_path: Optional[str] + was_already_loaded_from_elsewhere: bool + + +def _load_nvidia_dynamic_library_no_cache(libname: str) -> LoadedDL: + # Detect if the library was loaded already in some other way (i.e. not via this function). if sys.platform == "win32": - handle = _windows_load_with_dll_basename(name) - if handle: - return handle + for dll_name in SUPPORTED_WINDOWS_DLLS.get(libname, ()): + try: + handle = win32api.GetModuleHandle(dll_name) + except pywintypes.error: + pass + else: + return LoadedDL(handle, _abs_path_for_dynamic_library_windows(handle), True) else: - dl_path = f"lib{name}.so" # Version intentionally no specified. - try: - handle = ctypes.CDLL(dl_path, _LINUX_CDLL_MODE) - except OSError: - pass + for soname in SUPPORTED_LINUX_SONAMES.get(libname, ()): + try: + handle = ctypes.CDLL(soname, mode=os.RTLD_NOLOAD) + except OSError: + pass + else: + return LoadedDL(handle, _abs_path_for_dynamic_library_linux(libname, handle), True) + + for dep in DIRECT_DEPENDENCIES.get(libname, ()): + load_nvidia_dynamic_library(dep) + + found = _find_nvidia_dynamic_library(libname) + if found.abs_path is None: + if sys.platform == "win32": + handle, abs_path = _windows_load_with_dll_basename(libname) + if handle: + return LoadedDL(handle, abs_path, False) else: - # Use `cdef void* ptr = ` in cython to convert back to void* - return handle._handle # C unsigned int + try: + handle, abs_path = _load_and_report_path_linux(libname, found.lib_searched_for) + except OSError: + pass + else: + return LoadedDL(handle._handle, abs_path, False) + found.raise_if_abs_path_is_None() - dl_path = find_nvidia_dynamic_library(name) if sys.platform == "win32": + if libname in LIBNAMES_REQUIRING_OS_ADD_DLL_DIRECTORY: + _add_dll_directory(found.abs_path) flags = _WINBASE_LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | _WINBASE_LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR try: - handle = win32api.LoadLibraryEx(dl_path, 0, flags) + handle = win32api.LoadLibraryEx(found.abs_path, 0, flags) except pywintypes.error as e: - raise RuntimeError(f"Failed to load DLL at {dl_path}: {e}") from e - # Use `cdef void* ptr = ` in cython to convert back to void* - return handle # C signed int, matches win32api.GetProcAddress + raise RuntimeError(f"Failed to load DLL at {found.abs_path}: {e}") from e + return LoadedDL(handle, found.abs_path, False) else: try: - handle = ctypes.CDLL(dl_path, _LINUX_CDLL_MODE) + handle = ctypes.CDLL(found.abs_path, _LINUX_CDLL_MODE) except OSError as e: - raise RuntimeError(f"Failed to dlopen {dl_path}: {e}") from e - # Use `cdef void* ptr = ` in cython to convert back to void* - return handle._handle # C unsigned int + raise RuntimeError(f"Failed to dlopen {found.abs_path}: {e}") from e + return LoadedDL(handle._handle, found.abs_path, False) + + +@functools.cache +def load_nvidia_dynamic_library(libname: str) -> LoadedDL: + return _load_nvidia_dynamic_library_no_cache(libname) diff --git a/cuda_bindings/cuda/bindings/_path_finder/supported_libs.py b/cuda_bindings/cuda/bindings/_path_finder/supported_libs.py new file mode 100644 index 000000000..ee62b92b8 --- /dev/null +++ b/cuda_bindings/cuda/bindings/_path_finder/supported_libs.py @@ -0,0 +1,364 @@ +# Copyright 2025 NVIDIA Corporation. All rights reserved. +# +# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE + +# THIS FILE NEEDS TO BE REVIEWED/UPDATED FOR EACH CTK RELEASE + +SUPPORTED_LIBNAMES = ( + # Core CUDA Runtime and Compiler + "nvJitLink", + "nvrtc", + "nvvm", +) + +PARTIALLY_SUPPORTED_LIBNAMES = ( + # Core CUDA Runtime and Compiler + "cudart", + "nvfatbin", + # Math Libraries + "cublas", + "cublasLt", + "cufft", + "cufftw", + "curand", + "cusolver", + "cusolverMg", + "cusparse", + "nppc", + "nppial", + "nppicc", + "nppidei", + "nppif", + "nppig", + "nppim", + "nppist", + "nppisu", + "nppitc", + "npps", + "nvblas", + # Other + "cufile", + # "cufile_rdma", # Requires libmlx5.so + "nvjpeg", +) + +# Based on ldd output for Linux x86_64 nvidia-*-cu12 wheels (12.8.1) +DIRECT_DEPENDENCIES = { + "cublas": ("cublasLt",), + "cufftw": ("cufft",), + # "cufile_rdma": ("cufile",), + "cusolver": ("nvJitLink", "cusparse", "cublasLt", "cublas"), + "cusolverMg": ("nvJitLink", "cublasLt", "cublas"), + "cusparse": ("nvJitLink",), + "nppial": ("nppc",), + "nppicc": ("nppc",), + "nppidei": ("nppc",), + "nppif": ("nppc",), + "nppig": ("nppc",), + "nppim": ("nppc",), + "nppist": ("nppc",), + "nppisu": ("nppc",), + "nppitc": ("nppc",), + "npps": ("nppc",), + "nvblas": ("cublas", "cublasLt"), +} + +# Based on these released files: +# cuda_11.0.3_450.51.06_linux.run +# cuda_11.1.1_455.32.00_linux.run +# cuda_11.2.2_460.32.03_linux.run +# cuda_11.3.1_465.19.01_linux.run +# cuda_11.4.4_470.82.01_linux.run +# cuda_11.5.1_495.29.05_linux.run +# cuda_11.6.2_510.47.03_linux.run +# cuda_11.7.1_515.65.01_linux.run +# cuda_11.8.0_520.61.05_linux.run +# cuda_12.0.1_525.85.12_linux.run +# cuda_12.1.1_530.30.02_linux.run +# cuda_12.2.2_535.104.05_linux.run +# cuda_12.3.2_545.23.08_linux.run +# cuda_12.4.1_550.54.15_linux.run +# cuda_12.5.1_555.42.06_linux.run +# cuda_12.6.2_560.35.03_linux.run +# cuda_12.8.0_570.86.10_linux.run +# Generated with toolshed/build_path_finder_sonames.py +SUPPORTED_LINUX_SONAMES = { + "cublas": ( + "libcublas.so.11", + "libcublas.so.12", + ), + "cublasLt": ( + "libcublasLt.so.11", + "libcublasLt.so.12", + ), + "cudart": ( + "libcudart.so.11.0", + "libcudart.so.12", + ), + "cufft": ( + "libcufft.so.10", + "libcufft.so.11", + ), + "cufftw": ( + "libcufftw.so.10", + "libcufftw.so.11", + ), + "cufile": ("libcufile.so.0",), + # "cufile_rdma": ("libcufile_rdma.so.1",), + "curand": ("libcurand.so.10",), + "cusolver": ( + "libcusolver.so.10", + "libcusolver.so.11", + ), + "cusolverMg": ( + "libcusolverMg.so.10", + "libcusolverMg.so.11", + ), + "cusparse": ( + "libcusparse.so.11", + "libcusparse.so.12", + ), + "nppc": ( + "libnppc.so.11", + "libnppc.so.12", + ), + "nppial": ( + "libnppial.so.11", + "libnppial.so.12", + ), + "nppicc": ( + "libnppicc.so.11", + "libnppicc.so.12", + ), + "nppidei": ( + "libnppidei.so.11", + "libnppidei.so.12", + ), + "nppif": ( + "libnppif.so.11", + "libnppif.so.12", + ), + "nppig": ( + "libnppig.so.11", + "libnppig.so.12", + ), + "nppim": ( + "libnppim.so.11", + "libnppim.so.12", + ), + "nppist": ( + "libnppist.so.11", + "libnppist.so.12", + ), + "nppisu": ( + "libnppisu.so.11", + "libnppisu.so.12", + ), + "nppitc": ( + "libnppitc.so.11", + "libnppitc.so.12", + ), + "npps": ( + "libnpps.so.11", + "libnpps.so.12", + ), + "nvJitLink": ("libnvJitLink.so.12",), + "nvblas": ( + "libnvblas.so.11", + "libnvblas.so.12", + ), + "nvfatbin": ("libnvfatbin.so.12",), + "nvjpeg": ( + "libnvjpeg.so.11", + "libnvjpeg.so.12", + ), + "nvrtc": ( + "libnvrtc.so.11.0", + "libnvrtc.so.11.1", + "libnvrtc.so.11.2", + "libnvrtc.so.12", + ), + "nvvm": ( + "libnvvm.so.3", + "libnvvm.so.4", + ), +} + +# Based on these released files: +# cuda_11.0.3_451.82_win10.exe +# cuda_11.1.1_456.81_win10.exe +# cuda_11.2.2_461.33_win10.exe +# cuda_11.3.1_465.89_win10.exe +# cuda_11.4.4_472.50_windows.exe +# cuda_11.5.1_496.13_windows.exe +# cuda_11.6.2_511.65_windows.exe +# cuda_11.7.1_516.94_windows.exe +# cuda_11.8.0_522.06_windows.exe +# cuda_12.0.1_528.33_windows.exe +# cuda_12.1.1_531.14_windows.exe +# cuda_12.2.2_537.13_windows.exe +# cuda_12.3.2_546.12_windows.exe +# cuda_12.4.1_551.78_windows.exe +# cuda_12.5.1_555.85_windows.exe +# cuda_12.6.2_560.94_windows.exe +# cuda_12.8.1_572.61_windows.exe +# Generated with toolshed/build_path_finder_dlls.py (WITH MANUAL EDITS) +SUPPORTED_WINDOWS_DLLS = { + "cublas": ( + "cublas64_11.dll", + "cublas64_12.dll", + ), + "cublasLt": ( + "cublasLt64_11.dll", + "cublasLt64_12.dll", + ), + "cudart": ( + "cudart32_110.dll", + "cudart32_65.dll", + "cudart32_90.dll", + "cudart64_101.dll", + "cudart64_110.dll", + "cudart64_12.dll", + "cudart64_65.dll", + ), + "cufft": ( + "cufft64_10.dll", + "cufft64_11.dll", + "cufftw64_10.dll", + "cufftw64_11.dll", + ), + "cufftw": ( + "cufftw64_10.dll", + "cufftw64_11.dll", + ), + "cufile": (), + # "cufile_rdma": (), + "curand": ("curand64_10.dll",), + "cusolver": ( + "cusolver64_10.dll", + "cusolver64_11.dll", + ), + "cusolverMg": ( + "cusolverMg64_10.dll", + "cusolverMg64_11.dll", + ), + "cusparse": ( + "cusparse64_11.dll", + "cusparse64_12.dll", + ), + "nppc": ( + "nppc64_11.dll", + "nppc64_12.dll", + ), + "nppial": ( + "nppial64_11.dll", + "nppial64_12.dll", + ), + "nppicc": ( + "nppicc64_11.dll", + "nppicc64_12.dll", + ), + "nppidei": ( + "nppidei64_11.dll", + "nppidei64_12.dll", + ), + "nppif": ( + "nppif64_11.dll", + "nppif64_12.dll", + ), + "nppig": ( + "nppig64_11.dll", + "nppig64_12.dll", + ), + "nppim": ( + "nppim64_11.dll", + "nppim64_12.dll", + ), + "nppist": ( + "nppist64_11.dll", + "nppist64_12.dll", + ), + "nppisu": ( + "nppisu64_11.dll", + "nppisu64_12.dll", + ), + "nppitc": ( + "nppitc64_11.dll", + "nppitc64_12.dll", + ), + "npps": ( + "npps64_11.dll", + "npps64_12.dll", + ), + "nvJitLink": ("nvJitLink_120_0.dll",), + "nvblas": ( + "nvblas64_11.dll", + "nvblas64_12.dll", + ), + "nvfatbin": ("nvfatbin_120_0.dll",), + "nvjpeg": ( + "nvjpeg64_11.dll", + "nvjpeg64_12.dll", + ), + "nvrtc": ( + "nvrtc64_110_0.dll", + "nvrtc64_111_0.dll", + "nvrtc64_112_0.dll", + "nvrtc64_120_0.dll", + ), + "nvvm": ( + "nvvm32.dll", + "nvvm64.dll", + "nvvm64_33_0.dll", + "nvvm64_40_0.dll", + ), +} + +LIBNAMES_REQUIRING_OS_ADD_DLL_DIRECTORY = ( + "cufft", + "nvrtc", +) + + +def is_suppressed_dll_file(path_basename: str) -> bool: + if path_basename.startswith("nvrtc"): + # nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-win_amd64.whl: + # nvidia\cuda_nvrtc\bin\ + # nvrtc-builtins64_128.dll + # nvrtc64_120_0.alt.dll + # nvrtc64_120_0.dll + return path_basename.endswith(".alt.dll") or "-builtins" in path_basename + return False + + +# Based on nm output for Linux x86_64 /usr/local/cuda (12.8.1) +EXPECTED_LIB_SYMBOLS = { + "nvJitLink": ("nvJitLinkVersion",), + "nvrtc": ("nvrtcVersion",), + "nvvm": ("nvvmVersion",), + "cudart": ("cudaRuntimeGetVersion",), + "nvfatbin": ("nvFatbinVersion",), + "cublas": ("cublasGetVersion",), + "cublasLt": ("cublasLtGetVersion",), + "cufft": ("cufftGetVersion",), + "cufftw": ("fftwf_malloc",), + "curand": ("curandGetVersion",), + "cusolver": ("cusolverGetVersion",), + "cusolverMg": ("cusolverMgCreate",), + "cusparse": ("cusparseGetVersion",), + "nppc": ("nppGetLibVersion",), + "nppial": ("nppiAdd_32f_C1R",), + "nppicc": ("nppiColorToGray_8u_C3C1R",), + "nppidei": ("nppiCopy_8u_C1R",), + "nppif": ("nppiFilterSobelHorizBorder_8u_C1R",), + "nppig": ("nppiResize_8u_C1R",), + "nppim": ("nppiErode_8u_C1R",), + "nppist": ("nppiMean_8u_C1R",), + "nppisu": ("nppiFree",), + "nppitc": ("nppiThreshold_8u_C1R",), + "npps": ("nppsAdd_32f",), + "nvblas": ("dgemm",), + "cufile": ("cuFileGetVersion",), + # "cufile_rdma": ("rdma_buffer_reg",), + "nvjpeg": ("nvjpegCreate",), +} diff --git a/cuda_bindings/cuda/bindings/path_finder.py b/cuda_bindings/cuda/bindings/path_finder.py index 21aeb4b36..9c08bdc25 100644 --- a/cuda_bindings/cuda/bindings/path_finder.py +++ b/cuda_bindings/cuda/bindings/path_finder.py @@ -2,36 +2,10 @@ # # SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE -from cuda.bindings._path_finder.cuda_paths import ( - get_conda_ctk, - get_conda_include_dir, - get_cuda_home, - get_cuda_paths, - get_current_cuda_target_name, - get_debian_pkg_libdevice, - get_libdevice_wheel, - get_nvidia_cudalib_ctk, - get_nvidia_libdevice_ctk, - get_nvidia_nvvm_ctk, - get_nvidia_static_cudalib_ctk, - get_system_ctk, -) -from cuda.bindings._path_finder.find_nvidia_dynamic_library import find_nvidia_dynamic_library from cuda.bindings._path_finder.load_nvidia_dynamic_library import load_nvidia_dynamic_library +from cuda.bindings._path_finder.supported_libs import SUPPORTED_LIBNAMES __all__ = [ - "find_nvidia_dynamic_library", "load_nvidia_dynamic_library", - "get_conda_ctk", - "get_conda_include_dir", - "get_cuda_home", - "get_cuda_paths", - "get_current_cuda_target_name", - "get_debian_pkg_libdevice", - "get_libdevice_wheel", - "get_nvidia_cudalib_ctk", - "get_nvidia_libdevice_ctk", - "get_nvidia_nvvm_ctk", - "get_nvidia_static_cudalib_ctk", - "get_system_ctk", + "SUPPORTED_LIBNAMES", ] diff --git a/cuda_bindings/pyproject.toml b/cuda_bindings/pyproject.toml index e6a9492f5..8921cc5a2 100644 --- a/cuda_bindings/pyproject.toml +++ b/cuda_bindings/pyproject.toml @@ -25,7 +25,6 @@ classifiers = [ "Programming Language :: Python :: 3.10", "Programming Language :: Python :: 3.11", "Programming Language :: Python :: 3.12", - "Programming Language :: Python :: 3.13", "Environment :: GPU :: NVIDIA CUDA", ] dynamic = [ diff --git a/cuda_bindings/tests/conftest.py b/cuda_bindings/tests/conftest.py new file mode 100644 index 000000000..bcdc37db4 --- /dev/null +++ b/cuda_bindings/tests/conftest.py @@ -0,0 +1,20 @@ +import pytest + + +def pytest_configure(config): + config.custom_info = [] + + +def pytest_terminal_summary(terminalreporter, exitstatus, config): + if config.custom_info: + terminalreporter.write_sep("=", "INFO summary") + for msg in config.custom_info: + terminalreporter.line(f"INFO {msg}") + + +@pytest.fixture +def info_summary_append(request): + def _append(message): + request.config.custom_info.append(f"{request.node.name}: {message}") + + return _append diff --git a/cuda_bindings/tests/path_finder.py b/cuda_bindings/tests/path_finder.py deleted file mode 100644 index 9b7dd23a3..000000000 --- a/cuda_bindings/tests/path_finder.py +++ /dev/null @@ -1,18 +0,0 @@ -from cuda.bindings import path_finder - -paths = path_finder.get_cuda_paths() - -for k, v in paths.items(): - print(f"{k}: {v}", flush=True) -print() - -libnames = ("nvJitLink", "nvrtc", "nvvm") - -for libname in libnames: - print(path_finder.find_nvidia_dynamic_library(libname)) - print() - -for libname in libnames: - print(libname) - print(path_finder.load_nvidia_dynamic_library(libname)) - print() diff --git a/cuda_bindings/tests/test_path_finder.py b/cuda_bindings/tests/test_path_finder.py new file mode 100644 index 000000000..cb659026f --- /dev/null +++ b/cuda_bindings/tests/test_path_finder.py @@ -0,0 +1,92 @@ +import os +import subprocess # nosec B404 +import sys + +import pytest + +from cuda.bindings import path_finder +from cuda.bindings._path_finder import supported_libs + +ALL_LIBNAMES = path_finder.SUPPORTED_LIBNAMES + supported_libs.PARTIALLY_SUPPORTED_LIBNAMES +if os.environ.get("CUDA_BINDINGS_PATH_FINDER_TEST_ALL_LIBNAMES", False): + TEST_LIBNAMES = ALL_LIBNAMES +else: + TEST_LIBNAMES = path_finder.SUPPORTED_LIBNAMES + + +def test_all_libnames_linux_sonames_consistency(): + assert tuple(sorted(ALL_LIBNAMES)) == tuple(sorted(supported_libs.SUPPORTED_LINUX_SONAMES.keys())) + + +def test_all_libnames_windows_dlls_consistency(): + assert tuple(sorted(ALL_LIBNAMES)) == tuple(sorted(supported_libs.SUPPORTED_WINDOWS_DLLS.keys())) + + +def test_all_libnames_libnames_requiring_os_add_dll_directory_consistency(): + assert not (set(supported_libs.LIBNAMES_REQUIRING_OS_ADD_DLL_DIRECTORY) - set(ALL_LIBNAMES)) + + +def test_all_libnames_expected_lib_symbols_consistency(): + assert tuple(sorted(ALL_LIBNAMES)) == tuple(sorted(supported_libs.EXPECTED_LIB_SYMBOLS.keys())) + + +def _check_nvjitlink_usable(): + from cuda.bindings._internal import nvjitlink as inner_nvjitlink + + return inner_nvjitlink._inspect_function_pointer("__nvJitLinkVersion") != 0 + + +def _build_subprocess_failed_for_libname_message(libname, result): + return ( + f"Subprocess failed for {libname=!r} with exit code {result.returncode}\n" + f"--- stdout-from-subprocess ---\n{result.stdout}\n" + f"--- stderr-from-subprocess ---\n{result.stderr}\n" + ) + + +@pytest.mark.parametrize("api", ("find", "load")) +@pytest.mark.parametrize("libname", TEST_LIBNAMES) +def test_find_or_load_nvidia_dynamic_library(info_summary_append, api, libname): + if sys.platform == "win32" and not supported_libs.SUPPORTED_WINDOWS_DLLS[libname]: + pytest.skip(f"{libname=!r} not supported on {sys.platform=}") + + if libname == "nvJitLink" and not _check_nvjitlink_usable(): + pytest.skip(f"{libname=!r} not usable") + + if api == "find": + code = f"""\ +from cuda.bindings._path_finder.find_nvidia_dynamic_library import find_nvidia_dynamic_library +abs_path = find_nvidia_dynamic_library({libname!r}) +print(f"{{abs_path!r}}") +""" + else: + code = f"""\ +from cuda.bindings.path_finder import load_nvidia_dynamic_library +from cuda.bindings._path_finder.load_nvidia_dynamic_library import _load_nvidia_dynamic_library_no_cache + +loaded_dl_fresh = load_nvidia_dynamic_library({libname!r}) +if loaded_dl_fresh.was_already_loaded_from_elsewhere: + raise RuntimeError("loaded_dl_fresh.was_already_loaded_from_elsewhere") + +loaded_dl_from_cache = load_nvidia_dynamic_library({libname!r}) +if loaded_dl_from_cache is not loaded_dl_fresh: + raise RuntimeError("loaded_dl_from_cache is not loaded_dl_fresh") + +loaded_dl_no_cache = _load_nvidia_dynamic_library_no_cache({libname!r}) +if not loaded_dl_no_cache.was_already_loaded_from_elsewhere: + raise RuntimeError("loaded_dl_no_cache.was_already_loaded_from_elsewhere") +if loaded_dl_no_cache.abs_path != loaded_dl_fresh.abs_path: + raise RuntimeError(f"{{loaded_dl_no_cache.abs_path=!r}} != {{loaded_dl_fresh.abs_path=!r}}") + +print(f"{{loaded_dl_fresh.abs_path!r}}") +""" + result = subprocess.run( # nosec B603 + [sys.executable, "-c", code], + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + encoding="utf-8", + ) + if result.returncode == 0: + info_summary_append(f"abs_path={result.stdout.rstrip()}") + else: + raise RuntimeError(_build_subprocess_failed_for_libname_message(libname, result)) diff --git a/toolshed/build_path_finder_dlls.py b/toolshed/build_path_finder_dlls.py new file mode 100755 index 000000000..c82dcd866 --- /dev/null +++ b/toolshed/build_path_finder_dlls.py @@ -0,0 +1,84 @@ +#!/usr/bin/env python3 + +# Input for this script: .txt files generated with: +# for exe in *.exe; do 7z l $exe > "${exe%.exe}.txt"; done + +# The output of this script +# requires obvious manual edits to remove duplicates and unwanted dlls. + +import sys + +LIBNAMES_IN_SCOPE_OF_CUDA_BINDINGS_PATH_FINDER = ( + "nvJitLink", + "nvrtc", + "nvvm", + "cudart", + "nvfatbin", + "cublas", + "cublasLt", + "cufft", + "cufftw", + "curand", + "cusolver", + "cusolverMg", + "cusparse", + "nppc", + "nppial", + "nppicc", + "nppidei", + "nppif", + "nppig", + "nppim", + "nppist", + "nppisu", + "nppitc", + "npps", + "nvblas", + "cufile", + "cufile_rdma", + "nvjpeg", +) + + +def run(args): + dlls_from_files = set() + for filename in args: + lines_iter = iter(open(filename).read().splitlines()) + for line in lines_iter: + if line.startswith("-------------------"): + break + else: + raise RuntimeError("------------------- NOT FOUND") + for line in lines_iter: + if line.startswith("-------------------"): + break + assert line[52] == " ", line + assert line[53] != " ", line + path = line[53:] + if path.endswith(".dll"): + dll = path.rsplit("/", 1)[1] + dlls_from_files.add(dll) + else: + raise RuntimeError("------------------- NOT FOUND") + + print("DLLs in scope of cuda.bindings.path_finder") + print("==========================================") + dlls_in_scope = set() + for libname in sorted(LIBNAMES_IN_SCOPE_OF_CUDA_BINDINGS_PATH_FINDER): + print(f'"{libname}": (') + for dll in sorted(dlls_from_files): + if dll.startswith(libname): + dlls_in_scope.add(dll) + print(f' "{dll}",') + print("),") + print() + + print("DLLs out of scope") + print("=================") + for dll in sorted(dlls_from_files - dlls_in_scope): + print(dll) + print() + + +if __name__ == "__main__": + run(args=sys.argv[1:]) diff --git a/toolshed/build_path_finder_sonames.py b/toolshed/build_path_finder_sonames.py new file mode 100755 index 000000000..20e8ec6c7 --- /dev/null +++ b/toolshed/build_path_finder_sonames.py @@ -0,0 +1,74 @@ +#!/usr/bin/env python3 + +# Input for this script: +# output of toolshed/find_sonames.sh + +# The output of this script +# is expected to be usable as-is. + +import sys + +LIBNAMES_IN_SCOPE_OF_CUDA_BINDINGS_PATH_FINDER = ( + "nvJitLink", + "nvrtc", + "nvvm", + "cudart", + "nvfatbin", + "cublas", + "cublasLt", + "cufft", + "cufftw", + "curand", + "cusolver", + "cusolverMg", + "cusparse", + "nppc", + "nppial", + "nppicc", + "nppidei", + "nppif", + "nppig", + "nppim", + "nppist", + "nppisu", + "nppitc", + "npps", + "nvblas", + "cufile", + "cufile_rdma", + "nvjpeg", +) + + +def run(args): + assert len(args) == 1, "output-of-find_sonames.sh" + + sonames_from_file = set() + for line in open(args[0]).read().splitlines(): + flds = line.split() + assert len(flds) == 3, flds + if flds[-1] != "SONAME_NOT_SET": + sonames_from_file.add(flds[-1]) + + print("SONAMEs in scope of cuda.bindings.path_finder") + print("=============================================") + sonames_in_scope = set() + for libname in sorted(LIBNAMES_IN_SCOPE_OF_CUDA_BINDINGS_PATH_FINDER): + print(f'"{libname}": (') + lib_so = "lib" + libname + ".so" + for soname in sorted(sonames_from_file): + if soname.startswith(lib_so): + sonames_in_scope.add(soname) + print(f' "{soname}",') + print("),") + print() + + print("SONAMEs out of scope") + print("====================") + for soname in sorted(sonames_from_file - sonames_in_scope): + print(soname) + print() + + +if __name__ == "__main__": + run(args=sys.argv[1:]) diff --git a/toolshed/find_sonames.sh b/toolshed/find_sonames.sh new file mode 100755 index 000000000..79c2e89d5 --- /dev/null +++ b/toolshed/find_sonames.sh @@ -0,0 +1,6 @@ +#!/bin/bash +find "$@" -type f -name '*.so*' -print0 | while IFS= read -r -d '' f; do + type=$(test -L "$f" && echo SYMLINK || echo FILE) + soname=$(readelf -d "$f" 2>/dev/null | awk '/SONAME/ {gsub(/[][]/, "", $5); print $5; exit}') + echo "$f $type ${soname:-SONAME_NOT_SET}" +done diff --git a/toolshed/run_cuda_bindings_path_finder.py b/toolshed/run_cuda_bindings_path_finder.py new file mode 100644 index 000000000..5f47b3990 --- /dev/null +++ b/toolshed/run_cuda_bindings_path_finder.py @@ -0,0 +1,34 @@ +import sys +import traceback + +from cuda.bindings import path_finder +from cuda.bindings._path_finder import cuda_paths, supported_libs + +ALL_LIBNAMES = ( + path_finder.SUPPORTED_LIBNAMES + supported_libs.PARTIALLY_SUPPORTED_LIBNAMES +) + + +def run(args): + assert len(args) == 0 + + paths = cuda_paths.get_cuda_paths() + for k, v in paths.items(): + print(f"{k}: {v}", flush=True) + print() + + for libname in ALL_LIBNAMES: + print(f"{libname=}") + try: + loaded_dl = path_finder.load_nvidia_dynamic_library(libname) + except Exception: + print(f"EXCEPTION for {libname=}:") + traceback.print_exc(file=sys.stdout) + else: + print(f" {loaded_dl.abs_path=!r}") + print(f" {loaded_dl.was_already_loaded_from_elsewhere=!r}") + print() + + +if __name__ == "__main__": + run(args=sys.argv[1:]) From 00f8e4d9ae16d6ad5afdaac0550f8a687a5f2f42 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Fri, 25 Apr 2025 08:20:40 -0700 Subject: [PATCH 04/52] Fix tiny accident: a line in pyproject.toml got lost somehow. --- cuda_bindings/pyproject.toml | 1 + 1 file changed, 1 insertion(+) diff --git a/cuda_bindings/pyproject.toml b/cuda_bindings/pyproject.toml index 8921cc5a2..e6a9492f5 100644 --- a/cuda_bindings/pyproject.toml +++ b/cuda_bindings/pyproject.toml @@ -25,6 +25,7 @@ classifiers = [ "Programming Language :: Python :: 3.10", "Programming Language :: Python :: 3.11", "Programming Language :: Python :: 3.12", + "Programming Language :: Python :: 3.13", "Environment :: GPU :: NVIDIA CUDA", ] dynamic = [ From 17478da1e9677c44f117c817feeac9c236cfab4f Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Fri, 25 Apr 2025 09:01:31 -0700 Subject: [PATCH 05/52] Undo changes to the nvJitLink, nvrtc, nvvm bindings --- .../cuda/bindings/_bindings/cynvrtc.pyx.in | 64 ++++++++++++++++--- .../bindings/_internal/nvjitlink_linux.pyx | 20 ++++-- .../bindings/_internal/nvjitlink_windows.pyx | 53 ++++++++++++--- .../cuda/bindings/_internal/nvvm_linux.pyx | 18 ++++-- .../cuda/bindings/_internal/nvvm_windows.pyx | 61 +++++++++++++++--- .../cuda/bindings/_internal/utils.pxd | 3 + .../cuda/bindings/_internal/utils.pyx | 14 ++++ 7 files changed, 197 insertions(+), 36 deletions(-) diff --git a/cuda_bindings/cuda/bindings/_bindings/cynvrtc.pyx.in b/cuda_bindings/cuda/bindings/_bindings/cynvrtc.pyx.in index d2bb0b63b..caf36d40e 100644 --- a/cuda_bindings/cuda/bindings/_bindings/cynvrtc.pyx.in +++ b/cuda_bindings/cuda/bindings/_bindings/cynvrtc.pyx.in @@ -9,12 +9,13 @@ # This code was automatically generated with version 12.8.0. Do not modify it directly. {{if 'Windows' == platform.system()}} import os +import site +import struct import win32api +from pywintypes import error {{else}} cimport cuda.bindings._lib.dlfcn as dlfcn -from libc.stdint cimport uintptr_t {{endif}} -from cuda.bindings import path_finder cdef bint __cuPythonInit = False {{if 'nvrtcGetErrorString' in found_functions}}cdef void *__nvrtcGetErrorString = NULL{{endif}} @@ -45,18 +46,65 @@ cdef bint __cuPythonInit = False {{if 'nvrtcSetFlowCallback' in found_functions}}cdef void *__nvrtcSetFlowCallback = NULL{{endif}} cdef int cuPythonInit() except -1 nogil: - {{if 'Windows' != platform.system()}} - cdef void* handle = NULL - {{endif}} - global __cuPythonInit if __cuPythonInit: return 0 __cuPythonInit = True + # Load library + {{if 'Windows' == platform.system()}} + with gil: + # First check if the DLL has been loaded by 3rd parties + try: + handle = win32api.GetModuleHandle("nvrtc64_120_0.dll") + except: + handle = None + + # Check if DLLs can be found within pip installations + if not handle: + LOAD_LIBRARY_SEARCH_DEFAULT_DIRS = 0x00001000 + LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR = 0x00000100 + site_packages = [site.getusersitepackages()] + site.getsitepackages() + for sp in site_packages: + mod_path = os.path.join(sp, "nvidia", "cuda_nvrtc", "bin") + if os.path.isdir(mod_path): + os.add_dll_directory(mod_path) + try: + handle = win32api.LoadLibraryEx( + # Note: LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR needs an abs path... + os.path.join(mod_path, "nvrtc64_120_0.dll"), + 0, LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR) + + # Note: nvrtc64_120_0.dll calls into nvrtc-builtins64_*.dll which is + # located in the same mod_path. + # Update PATH environ so that the two dlls can find each other + os.environ["PATH"] = os.pathsep.join((os.environ.get("PATH", ""), mod_path)) + except: + pass + else: + break + else: + # Else try default search + # Only reached if DLL wasn't found in any site-package path + LOAD_LIBRARY_SAFE_CURRENT_DIRS = 0x00002000 + try: + handle = win32api.LoadLibraryEx("nvrtc64_120_0.dll", 0, LOAD_LIBRARY_SAFE_CURRENT_DIRS) + except: + pass + + if not handle: + raise RuntimeError('Failed to LoadLibraryEx nvrtc64_120_0.dll') + {{else}} + handle = dlfcn.dlopen('libnvrtc.so.12', dlfcn.RTLD_NOW) + if handle == NULL: + with gil: + raise RuntimeError('Failed to dlopen libnvrtc.so.12') + {{endif}} + + + # Load function {{if 'Windows' == platform.system()}} with gil: - handle = path_finder.load_nvidia_dynamic_library("nvrtc").handle {{if 'nvrtcGetErrorString' in found_functions}} try: global __nvrtcGetErrorString @@ -241,8 +289,6 @@ cdef int cuPythonInit() except -1 nogil: {{endif}} {{else}} - with gil: - handle = path_finder.load_nvidia_dynamic_library("nvrtc").handle {{if 'nvrtcGetErrorString' in found_functions}} global __nvrtcGetErrorString __nvrtcGetErrorString = dlfcn.dlsym(handle, 'nvrtcGetErrorString') diff --git a/cuda_bindings/cuda/bindings/_internal/nvjitlink_linux.pyx b/cuda_bindings/cuda/bindings/_internal/nvjitlink_linux.pyx index 78b4d802b..9961a2105 100644 --- a/cuda_bindings/cuda/bindings/_internal/nvjitlink_linux.pyx +++ b/cuda_bindings/cuda/bindings/_internal/nvjitlink_linux.pyx @@ -4,11 +4,11 @@ # # This code was automatically generated across versions from 12.0.1 to 12.8.0. Do not modify it directly. -from libc.stdint cimport intptr_t, uintptr_t +from libc.stdint cimport intptr_t -from .utils import FunctionNotFoundError, NotSupportedError +from .utils cimport get_nvjitlink_dso_version_suffix -from cuda.bindings import path_finder +from .utils import FunctionNotFoundError, NotSupportedError ############################################################################### # Extern @@ -52,9 +52,17 @@ cdef void* __nvJitLinkGetInfoLog = NULL cdef void* __nvJitLinkVersion = NULL -cdef void* load_library(int driver_ver) except* with gil: - cdef uintptr_t handle = path_finder.load_nvidia_dynamic_library("nvJitLink").handle - return handle +cdef void* load_library(const int driver_ver) except* with gil: + cdef void* handle + for suffix in get_nvjitlink_dso_version_suffix(driver_ver): + so_name = "libnvJitLink.so" + (f".{suffix}" if suffix else suffix) + handle = dlopen(so_name.encode(), RTLD_NOW | RTLD_GLOBAL) + if handle != NULL: + break + else: + err_msg = dlerror() + raise RuntimeError(f'Failed to dlopen libnvJitLink ({err_msg.decode()})') + return handle cdef int _check_or_init_nvjitlink() except -1 nogil: diff --git a/cuda_bindings/cuda/bindings/_internal/nvjitlink_windows.pyx b/cuda_bindings/cuda/bindings/_internal/nvjitlink_windows.pyx index b306a3001..979820442 100644 --- a/cuda_bindings/cuda/bindings/_internal/nvjitlink_windows.pyx +++ b/cuda_bindings/cuda/bindings/_internal/nvjitlink_windows.pyx @@ -6,9 +6,12 @@ from libc.stdint cimport intptr_t +from .utils cimport get_nvjitlink_dso_version_suffix + from .utils import FunctionNotFoundError, NotSupportedError -from cuda.bindings import path_finder +import os +import site import win32api @@ -39,9 +42,44 @@ cdef void* __nvJitLinkGetInfoLog = NULL cdef void* __nvJitLinkVersion = NULL -cdef void* load_library(int driver_ver) except* with gil: - cdef intptr_t handle = path_finder.load_nvidia_dynamic_library("nvJitLink").handle - return handle +cdef inline list get_site_packages(): + return [site.getusersitepackages()] + site.getsitepackages() + + +cdef load_library(const int driver_ver): + handle = 0 + + for suffix in get_nvjitlink_dso_version_suffix(driver_ver): + if len(suffix) == 0: + continue + dll_name = f"nvJitLink_{suffix}0_0.dll" + + # First check if the DLL has been loaded by 3rd parties + try: + return win32api.GetModuleHandle(dll_name) + except: + pass + + # Next, check if DLLs are installed via pip + for sp in get_site_packages(): + mod_path = os.path.join(sp, "nvidia", "nvJitLink", "bin") + if os.path.isdir(mod_path): + os.add_dll_directory(mod_path) + try: + return win32api.LoadLibraryEx( + # Note: LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR needs an abs path... + os.path.join(mod_path, dll_name), + 0, LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR) + except: + pass + # Finally, try default search + # Only reached if DLL wasn't found in any site-package path + try: + return win32api.LoadLibrary(dll_name) + except: + pass + + raise RuntimeError('Failed to load nvJitLink') cdef int _check_or_init_nvjitlink() except -1 nogil: @@ -50,16 +88,15 @@ cdef int _check_or_init_nvjitlink() except -1 nogil: return 0 cdef int err, driver_ver - cdef intptr_t handle with gil: # Load driver to check version try: - nvcuda_handle = win32api.LoadLibraryEx("nvcuda.dll", 0, LOAD_LIBRARY_SEARCH_SYSTEM32) + handle = win32api.LoadLibraryEx("nvcuda.dll", 0, LOAD_LIBRARY_SEARCH_SYSTEM32) except Exception as e: raise NotSupportedError(f'CUDA driver is not found ({e})') global __cuDriverGetVersion if __cuDriverGetVersion == NULL: - __cuDriverGetVersion = win32api.GetProcAddress(nvcuda_handle, 'cuDriverGetVersion') + __cuDriverGetVersion = win32api.GetProcAddress(handle, 'cuDriverGetVersion') if __cuDriverGetVersion == NULL: raise RuntimeError('something went wrong') err = (__cuDriverGetVersion)(&driver_ver) @@ -67,7 +104,7 @@ cdef int _check_or_init_nvjitlink() except -1 nogil: raise RuntimeError('something went wrong') # Load library - handle = load_library(driver_ver) + handle = load_library(driver_ver) # Load function global __nvJitLinkCreate diff --git a/cuda_bindings/cuda/bindings/_internal/nvvm_linux.pyx b/cuda_bindings/cuda/bindings/_internal/nvvm_linux.pyx index 82335508b..64e78e75a 100644 --- a/cuda_bindings/cuda/bindings/_internal/nvvm_linux.pyx +++ b/cuda_bindings/cuda/bindings/_internal/nvvm_linux.pyx @@ -4,11 +4,11 @@ # # This code was automatically generated across versions from 11.0.3 to 12.8.0. Do not modify it directly. -from libc.stdint cimport intptr_t, uintptr_t +from libc.stdint cimport intptr_t -from .utils import FunctionNotFoundError, NotSupportedError +from .utils cimport get_nvvm_dso_version_suffix -from cuda.bindings import path_finder +from .utils import FunctionNotFoundError, NotSupportedError ############################################################################### # Extern @@ -51,8 +51,16 @@ cdef void* __nvvmGetProgramLog = NULL cdef void* load_library(const int driver_ver) except* with gil: - cdef uintptr_t handle = path_finder.load_nvidia_dynamic_library("nvvm").handle - return handle + cdef void* handle + for suffix in get_nvvm_dso_version_suffix(driver_ver): + so_name = "libnvvm.so" + (f".{suffix}" if suffix else suffix) + handle = dlopen(so_name.encode(), RTLD_NOW | RTLD_GLOBAL) + if handle != NULL: + break + else: + err_msg = dlerror() + raise RuntimeError(f'Failed to dlopen libnvvm ({err_msg.decode()})') + return handle cdef int _check_or_init_nvvm() except -1 nogil: diff --git a/cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx b/cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx index 21b4d9418..9f507e8e1 100644 --- a/cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx +++ b/cuda_bindings/cuda/bindings/_internal/nvvm_windows.pyx @@ -6,9 +6,12 @@ from libc.stdint cimport intptr_t +from .utils cimport get_nvvm_dso_version_suffix + from .utils import FunctionNotFoundError, NotSupportedError -from cuda.bindings import path_finder +import os +import site import win32api @@ -37,9 +40,52 @@ cdef void* __nvvmGetProgramLogSize = NULL cdef void* __nvvmGetProgramLog = NULL -cdef void* load_library(int driver_ver) except* with gil: - cdef intptr_t handle = path_finder.load_nvidia_dynamic_library("nvvm").handle - return handle +cdef inline list get_site_packages(): + return [site.getusersitepackages()] + site.getsitepackages() + ["conda"] + + +cdef load_library(const int driver_ver): + handle = 0 + + for suffix in get_nvvm_dso_version_suffix(driver_ver): + if len(suffix) == 0: + continue + dll_name = "nvvm64_40_0.dll" + + # First check if the DLL has been loaded by 3rd parties + try: + return win32api.GetModuleHandle(dll_name) + except: + pass + + # Next, check if DLLs are installed via pip or conda + for sp in get_site_packages(): + if sp == "conda": + # nvvm is not under $CONDA_PREFIX/lib, so it's not in the default search path + conda_prefix = os.environ.get("CONDA_PREFIX") + if conda_prefix is None: + continue + mod_path = os.path.join(conda_prefix, "Library", "nvvm", "bin") + else: + mod_path = os.path.join(sp, "nvidia", "cuda_nvcc", "nvvm", "bin") + if os.path.isdir(mod_path): + os.add_dll_directory(mod_path) + try: + return win32api.LoadLibraryEx( + # Note: LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR needs an abs path... + os.path.join(mod_path, dll_name), + 0, LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR) + except: + pass + + # Finally, try default search + # Only reached if DLL wasn't found in any site-package path + try: + return win32api.LoadLibrary(dll_name) + except: + pass + + raise RuntimeError('Failed to load nvvm') cdef int _check_or_init_nvvm() except -1 nogil: @@ -48,16 +94,15 @@ cdef int _check_or_init_nvvm() except -1 nogil: return 0 cdef int err, driver_ver - cdef intptr_t handle with gil: # Load driver to check version try: - nvcuda_handle = win32api.LoadLibraryEx("nvcuda.dll", 0, LOAD_LIBRARY_SEARCH_SYSTEM32) + handle = win32api.LoadLibraryEx("nvcuda.dll", 0, LOAD_LIBRARY_SEARCH_SYSTEM32) except Exception as e: raise NotSupportedError(f'CUDA driver is not found ({e})') global __cuDriverGetVersion if __cuDriverGetVersion == NULL: - __cuDriverGetVersion = win32api.GetProcAddress(nvcuda_handle, 'cuDriverGetVersion') + __cuDriverGetVersion = win32api.GetProcAddress(handle, 'cuDriverGetVersion') if __cuDriverGetVersion == NULL: raise RuntimeError('something went wrong') err = (__cuDriverGetVersion)(&driver_ver) @@ -65,7 +110,7 @@ cdef int _check_or_init_nvvm() except -1 nogil: raise RuntimeError('something went wrong') # Load library - handle = load_library(driver_ver) + handle = load_library(driver_ver) # Load function global __nvvmVersion diff --git a/cuda_bindings/cuda/bindings/_internal/utils.pxd b/cuda_bindings/cuda/bindings/_internal/utils.pxd index a4b71c531..cac7846ff 100644 --- a/cuda_bindings/cuda/bindings/_internal/utils.pxd +++ b/cuda_bindings/cuda/bindings/_internal/utils.pxd @@ -165,3 +165,6 @@ cdef int get_nested_resource_ptr(nested_resource[ResT] &in_out_ptr, object obj, cdef bint is_nested_sequence(data) cdef void* get_buffer_pointer(buf, Py_ssize_t size, readonly=*) except* + +cdef tuple get_nvjitlink_dso_version_suffix(int driver_ver) +cdef tuple get_nvvm_dso_version_suffix(int driver_ver) diff --git a/cuda_bindings/cuda/bindings/_internal/utils.pyx b/cuda_bindings/cuda/bindings/_internal/utils.pyx index 7fc77b22c..0a693c052 100644 --- a/cuda_bindings/cuda/bindings/_internal/utils.pyx +++ b/cuda_bindings/cuda/bindings/_internal/utils.pyx @@ -127,3 +127,17 @@ cdef int get_nested_resource_ptr(nested_resource[ResT] &in_out_ptr, object obj, class FunctionNotFoundError(RuntimeError): pass class NotSupportedError(RuntimeError): pass + + +cdef tuple get_nvjitlink_dso_version_suffix(int driver_ver): + if 12000 <= driver_ver < 13000: + return ('12', '') + raise NotSupportedError(f'CUDA driver version {driver_ver} is not supported') + + +cdef tuple get_nvvm_dso_version_suffix(int driver_ver): + if 11000 <= driver_ver < 11020: + return ('3', '') + if 11020 <= driver_ver < 13000: + return ('4', '') + raise NotSupportedError(f'CUDA driver version {driver_ver} is not supported') From 7da74bdcb7bd0f1a885133afad9d9218434f1577 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Fri, 25 Apr 2025 09:48:48 -0700 Subject: [PATCH 06/52] Undo changes under .github, specific to nvvm, manipulating LD_LIBRARY_PATH or PATH --- .github/actions/fetch_ctk/action.yml | 2 +- .github/workflows/test-wheel-windows.yml | 7 +++++++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/.github/actions/fetch_ctk/action.yml b/.github/actions/fetch_ctk/action.yml index 3e8e48c4d..9fdb0a1f8 100644 --- a/.github/actions/fetch_ctk/action.yml +++ b/.github/actions/fetch_ctk/action.yml @@ -128,4 +128,4 @@ runs: echo "CUDA_PATH=${CUDA_PATH}" >> $GITHUB_ENV echo "CUDA_HOME=${CUDA_PATH}" >> $GITHUB_ENV echo "${CUDA_PATH}/bin" >> $GITHUB_PATH - echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH:-}:${CUDA_PATH}/lib" >> $GITHUB_ENV + echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH:-}:${CUDA_PATH}/lib:${CUDA_PATH}/nvvm/lib64" >> $GITHUB_ENV diff --git a/.github/workflows/test-wheel-windows.yml b/.github/workflows/test-wheel-windows.yml index 948d2fae6..4e48590a3 100644 --- a/.github/workflows/test-wheel-windows.yml +++ b/.github/workflows/test-wheel-windows.yml @@ -164,6 +164,13 @@ jobs: method: 'network' sub-packages: ${{ env.MINI_CTK_DEPS }} + - name: Update PATH + if: ${{ inputs.local-ctk == '1' }} + run: | + # mimics actual CTK installation + echo $PATH + echo "$env:CUDA_PATH\nvvm\bin" >> $env:GITHUB_PATH + - name: Run cuda.bindings tests if: ${{ env.SKIP_CUDA_BINDINGS_TEST == '0' }} run: | From 211164d0d95fea4f037afc28e6972e63554723e6 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Fri, 25 Apr 2025 12:28:10 -0700 Subject: [PATCH 07/52] PARTIALLY_SUPPORTED_LIBNAMES_LINUX, PARTIALLY_SUPPORTED_LIBNAMES_WINDOWS --- .../bindings/_path_finder/supported_libs.py | 28 ++++++++++++++++--- cuda_bindings/tests/test_path_finder.py | 22 ++++++++------- 2 files changed, 36 insertions(+), 14 deletions(-) diff --git a/cuda_bindings/cuda/bindings/_path_finder/supported_libs.py b/cuda_bindings/cuda/bindings/_path_finder/supported_libs.py index ee62b92b8..d7d3a56c9 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/supported_libs.py +++ b/cuda_bindings/cuda/bindings/_path_finder/supported_libs.py @@ -4,6 +4,8 @@ # THIS FILE NEEDS TO BE REVIEWED/UPDATED FOR EACH CTK RELEASE +import sys + SUPPORTED_LIBNAMES = ( # Core CUDA Runtime and Compiler "nvJitLink", @@ -11,7 +13,7 @@ "nvvm", ) -PARTIALLY_SUPPORTED_LIBNAMES = ( +PARTIALLY_SUPPORTED_LIBNAMES_COMMON = ( # Core CUDA Runtime and Compiler "cudart", "nvfatbin", @@ -37,11 +39,31 @@ "npps", "nvblas", # Other + "nvjpeg", +) + +PARTIALLY_SUPPORTED_LIBNAMES_LINUX_ONLY = ( "cufile", # "cufile_rdma", # Requires libmlx5.so - "nvjpeg", ) +PARTIALLY_SUPPORTED_LIBNAMES_LINUX = PARTIALLY_SUPPORTED_LIBNAMES_COMMON + PARTIALLY_SUPPORTED_LIBNAMES_LINUX_ONLY + +PARTIALLY_SUPPORTED_LIBNAMES_WINDOWS_ONLY = () + +PARTIALLY_SUPPORTED_LIBNAMES_WINDOWS = PARTIALLY_SUPPORTED_LIBNAMES_COMMON + PARTIALLY_SUPPORTED_LIBNAMES_WINDOWS_ONLY + +PARTIALLY_SUPPORTED_LIBNAMES_ALL = ( + PARTIALLY_SUPPORTED_LIBNAMES_COMMON + + PARTIALLY_SUPPORTED_LIBNAMES_LINUX_ONLY + + PARTIALLY_SUPPORTED_LIBNAMES_WINDOWS_ONLY +) + +if sys.platform == "win32": + PARTIALLY_SUPPORTED_LIBNAMES = PARTIALLY_SUPPORTED_LIBNAMES_WINDOWS +else: + PARTIALLY_SUPPORTED_LIBNAMES = PARTIALLY_SUPPORTED_LIBNAMES_LINUX + # Based on ldd output for Linux x86_64 nvidia-*-cu12 wheels (12.8.1) DIRECT_DEPENDENCIES = { "cublas": ("cublasLt",), @@ -231,8 +253,6 @@ "cufftw64_10.dll", "cufftw64_11.dll", ), - "cufile": (), - # "cufile_rdma": (), "curand": ("curand64_10.dll",), "cusolver": ( "cusolver64_10.dll", diff --git a/cuda_bindings/tests/test_path_finder.py b/cuda_bindings/tests/test_path_finder.py index cb659026f..abeeb73d2 100644 --- a/cuda_bindings/tests/test_path_finder.py +++ b/cuda_bindings/tests/test_path_finder.py @@ -7,23 +7,28 @@ from cuda.bindings import path_finder from cuda.bindings._path_finder import supported_libs -ALL_LIBNAMES = path_finder.SUPPORTED_LIBNAMES + supported_libs.PARTIALLY_SUPPORTED_LIBNAMES +ALL_LIBNAMES = path_finder.SUPPORTED_LIBNAMES + supported_libs.PARTIALLY_SUPPORTED_LIBNAMES_ALL +ALL_LIBNAMES_LINUX = path_finder.SUPPORTED_LIBNAMES + supported_libs.PARTIALLY_SUPPORTED_LIBNAMES_LINUX +ALL_LIBNAMES_WINDOWS = path_finder.SUPPORTED_LIBNAMES + supported_libs.PARTIALLY_SUPPORTED_LIBNAMES_WINDOWS if os.environ.get("CUDA_BINDINGS_PATH_FINDER_TEST_ALL_LIBNAMES", False): - TEST_LIBNAMES = ALL_LIBNAMES + if sys.platform == "win32": + TEST_FIND_OR_LOAD_LIBNAMES = ALL_LIBNAMES_WINDOWS + else: + TEST_FIND_OR_LOAD_LIBNAMES = ALL_LIBNAMES_LINUX else: - TEST_LIBNAMES = path_finder.SUPPORTED_LIBNAMES + TEST_FIND_OR_LOAD_LIBNAMES = path_finder.SUPPORTED_LIBNAMES def test_all_libnames_linux_sonames_consistency(): - assert tuple(sorted(ALL_LIBNAMES)) == tuple(sorted(supported_libs.SUPPORTED_LINUX_SONAMES.keys())) + assert tuple(sorted(ALL_LIBNAMES_LINUX)) == tuple(sorted(supported_libs.SUPPORTED_LINUX_SONAMES.keys())) def test_all_libnames_windows_dlls_consistency(): - assert tuple(sorted(ALL_LIBNAMES)) == tuple(sorted(supported_libs.SUPPORTED_WINDOWS_DLLS.keys())) + assert tuple(sorted(ALL_LIBNAMES_WINDOWS)) == tuple(sorted(supported_libs.SUPPORTED_WINDOWS_DLLS.keys())) def test_all_libnames_libnames_requiring_os_add_dll_directory_consistency(): - assert not (set(supported_libs.LIBNAMES_REQUIRING_OS_ADD_DLL_DIRECTORY) - set(ALL_LIBNAMES)) + assert not (set(supported_libs.LIBNAMES_REQUIRING_OS_ADD_DLL_DIRECTORY) - set(ALL_LIBNAMES_WINDOWS)) def test_all_libnames_expected_lib_symbols_consistency(): @@ -45,11 +50,8 @@ def _build_subprocess_failed_for_libname_message(libname, result): @pytest.mark.parametrize("api", ("find", "load")) -@pytest.mark.parametrize("libname", TEST_LIBNAMES) +@pytest.mark.parametrize("libname", TEST_FIND_OR_LOAD_LIBNAMES) def test_find_or_load_nvidia_dynamic_library(info_summary_append, api, libname): - if sys.platform == "win32" and not supported_libs.SUPPORTED_WINDOWS_DLLS[libname]: - pytest.skip(f"{libname=!r} not supported on {sys.platform=}") - if libname == "nvJitLink" and not _check_nvjitlink_usable(): pytest.skip(f"{libname=!r} not usable") From a649e7d43333ba3e07e866c8bb6c20de150932b4 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Fri, 25 Apr 2025 12:50:54 -0700 Subject: [PATCH 08/52] Update EXPECTED_LIB_SYMBOLS for nvJitLink to cleanly support CTK versions 12.0, 12.1, 12.2 --- .../cuda/bindings/_path_finder/supported_libs.py | 5 ++++- cuda_bindings/tests/test_path_finder.py | 9 --------- 2 files changed, 4 insertions(+), 10 deletions(-) diff --git a/cuda_bindings/cuda/bindings/_path_finder/supported_libs.py b/cuda_bindings/cuda/bindings/_path_finder/supported_libs.py index d7d3a56c9..b0dfcb9e7 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/supported_libs.py +++ b/cuda_bindings/cuda/bindings/_path_finder/supported_libs.py @@ -353,7 +353,10 @@ def is_suppressed_dll_file(path_basename: str) -> bool: # Based on nm output for Linux x86_64 /usr/local/cuda (12.8.1) EXPECTED_LIB_SYMBOLS = { - "nvJitLink": ("nvJitLinkVersion",), + "nvJitLink": ( + "__nvJitLinkCreate_12_0", # 12.0 through 12.8 (at least) + "nvJitLinkVersion", # 12.3 and up + ), "nvrtc": ("nvrtcVersion",), "nvvm": ("nvvmVersion",), "cudart": ("cudaRuntimeGetVersion",), diff --git a/cuda_bindings/tests/test_path_finder.py b/cuda_bindings/tests/test_path_finder.py index abeeb73d2..ea8c1b5d2 100644 --- a/cuda_bindings/tests/test_path_finder.py +++ b/cuda_bindings/tests/test_path_finder.py @@ -35,12 +35,6 @@ def test_all_libnames_expected_lib_symbols_consistency(): assert tuple(sorted(ALL_LIBNAMES)) == tuple(sorted(supported_libs.EXPECTED_LIB_SYMBOLS.keys())) -def _check_nvjitlink_usable(): - from cuda.bindings._internal import nvjitlink as inner_nvjitlink - - return inner_nvjitlink._inspect_function_pointer("__nvJitLinkVersion") != 0 - - def _build_subprocess_failed_for_libname_message(libname, result): return ( f"Subprocess failed for {libname=!r} with exit code {result.returncode}\n" @@ -52,9 +46,6 @@ def _build_subprocess_failed_for_libname_message(libname, result): @pytest.mark.parametrize("api", ("find", "load")) @pytest.mark.parametrize("libname", TEST_FIND_OR_LOAD_LIBNAMES) def test_find_or_load_nvidia_dynamic_library(info_summary_append, api, libname): - if libname == "nvJitLink" and not _check_nvjitlink_usable(): - pytest.skip(f"{libname=!r} not usable") - if api == "find": code = f"""\ from cuda.bindings._path_finder.find_nvidia_dynamic_library import find_nvidia_dynamic_library From b5cef1bc75d1e3955686ec43976459934ea9501e Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Fri, 25 Apr 2025 16:31:35 -0700 Subject: [PATCH 09/52] Save result of factoring out load_dl_common.py, load_dl_linux.py, load_dl_windows.py with the help of Cursor. --- .../bindings/_path_finder/load_dl_common.py | 60 ++++++ .../bindings/_path_finder/load_dl_linux.py | 125 +++++++++++ .../bindings/_path_finder/load_dl_windows.py | 141 +++++++++++++ .../load_nvidia_dynamic_library.py | 199 +++--------------- 4 files changed, 354 insertions(+), 171 deletions(-) create mode 100644 cuda_bindings/cuda/bindings/_path_finder/load_dl_common.py create mode 100644 cuda_bindings/cuda/bindings/_path_finder/load_dl_linux.py create mode 100644 cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_dl_common.py b/cuda_bindings/cuda/bindings/_path_finder/load_dl_common.py new file mode 100644 index 000000000..66f21ffcf --- /dev/null +++ b/cuda_bindings/cuda/bindings/_path_finder/load_dl_common.py @@ -0,0 +1,60 @@ +# Copyright 2025 NVIDIA Corporation. All rights reserved. +# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE + +from dataclasses import dataclass +from typing import Callable, Optional + +from .supported_libs import DIRECT_DEPENDENCIES + + +@dataclass +class LoadedDL: + """Represents a loaded dynamic library. + + Attributes: + handle: The library handle (can be converted to void* in Cython) + abs_path: The absolute path to the library file + was_already_loaded_from_elsewhere: Whether the library was already loaded + """ + + # ATTENTION: To convert `handle` back to `void*` in cython: + # Linux: `cdef void* ptr = ` + # Windows: `cdef void* ptr = ` + handle: int + abs_path: Optional[str] + was_already_loaded_from_elsewhere: bool + + +def add_dll_directory(dll_abs_path: str) -> None: + """Add a DLL directory to the search path and update PATH environment variable. + + Args: + dll_abs_path: Absolute path to the DLL file + + Raises: + AssertionError: If the directory containing the DLL does not exist + """ + import os + + dirpath = os.path.dirname(dll_abs_path) + assert os.path.isdir(dirpath), dll_abs_path + # Add the DLL directory to the search path + os.add_dll_directory(dirpath) + # Update PATH as a fallback for dependent DLL resolution + curr_path = os.environ.get("PATH") + os.environ["PATH"] = dirpath if curr_path is None else os.pathsep.join((curr_path, dirpath)) + + +def load_dependencies(libname: str, load_func: Callable[[str], LoadedDL]) -> None: + """Load all dependencies for a given library. + + Args: + libname: The name of the library whose dependencies should be loaded + load_func: The function to use for loading libraries (e.g. load_nvidia_dynamic_library) + + Example: + >>> load_dependencies("cudart", load_nvidia_dynamic_library) + # This will load all dependencies of cudart using the provided loading function + """ + for dep in DIRECT_DEPENDENCIES.get(libname, ()): + load_func(dep) diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_dl_linux.py b/cuda_bindings/cuda/bindings/_path_finder/load_dl_linux.py new file mode 100644 index 000000000..e7458d4da --- /dev/null +++ b/cuda_bindings/cuda/bindings/_path_finder/load_dl_linux.py @@ -0,0 +1,125 @@ +# Copyright 2025 NVIDIA Corporation. All rights reserved. +# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE + +import ctypes +import ctypes.util +import os +from typing import Optional + +from .load_dl_common import LoadedDL + +CDLL_MODE = os.RTLD_NOW | os.RTLD_GLOBAL + +LIBDL_PATH = ctypes.util.find_library("dl") or "libdl.so.2" +LIBDL = ctypes.CDLL(LIBDL_PATH) +LIBDL.dladdr.argtypes = [ctypes.c_void_p, ctypes.c_void_p] +LIBDL.dladdr.restype = ctypes.c_int + + +class Dl_info(ctypes.Structure): + """Structure used by dladdr to return information about a loaded symbol.""" + + _fields_ = [ + ("dli_fname", ctypes.c_char_p), # path to .so + ("dli_fbase", ctypes.c_void_p), + ("dli_sname", ctypes.c_char_p), + ("dli_saddr", ctypes.c_void_p), + ] + + +def abs_path_for_dynamic_library(libname: str, handle: int) -> Optional[str]: + """Get the absolute path of a loaded dynamic library on Linux. + + Args: + libname: The name of the library + handle: The library handle + + Returns: + The absolute path to the library file, or None if no expected symbol is found + + Raises: + OSError: If dladdr fails to get information about the symbol + """ + from .supported_libs import EXPECTED_LIB_SYMBOLS + + for symbol_name in EXPECTED_LIB_SYMBOLS[libname]: + symbol = getattr(handle, symbol_name, None) + if symbol is not None: + break + else: + return None + + addr = ctypes.cast(symbol, ctypes.c_void_p) + info = Dl_info() + if LIBDL.dladdr(addr, ctypes.byref(info)) == 0: + raise OSError(f"dladdr failed for {libname=!r}") + return info.dli_fname.decode() + + +def check_if_already_loaded(libname: str) -> Optional[LoadedDL]: + """Check if the library is already loaded in the process. + + Args: + libname: The name of the library to check + + Returns: + A LoadedDL object if the library is already loaded, None otherwise + + Example: + >>> loaded = check_if_already_loaded("cudart") + >>> if loaded is not None: + ... print(f"Library already loaded from {loaded.abs_path}") + """ + from .supported_libs import SUPPORTED_LINUX_SONAMES + + for soname in SUPPORTED_LINUX_SONAMES.get(libname, ()): + try: + handle = ctypes.CDLL(soname, mode=os.RTLD_NOLOAD) + except OSError: + continue + else: + return LoadedDL(handle._handle, abs_path_for_dynamic_library(libname, handle), True) + return None + + +def load_with_system_search(libname: str, soname: str) -> Optional[LoadedDL]: + """Try to load a library using system search paths. + + Args: + libname: The name of the library to load + soname: The soname to search for + + Returns: + A LoadedDL object if successful, None if the library cannot be loaded + + Raises: + RuntimeError: If the library is loaded but no expected symbol is found + """ + try: + handle = ctypes.CDLL(soname, CDLL_MODE) + abs_path = abs_path_for_dynamic_library(libname, handle) + if abs_path is None: + raise RuntimeError(f"No expected symbol for {libname=!r}") + return LoadedDL(handle._handle, abs_path, False) + except OSError: + return None + + +def load_with_abs_path(libname: str, found_path: str) -> LoadedDL: + """Load a dynamic library from the given path. + + Args: + libname: The name of the library to load + found_path: The absolute path to the library file + + Returns: + A LoadedDL object representing the loaded library + + Raises: + RuntimeError: If the library cannot be loaded + """ + try: + handle = ctypes.CDLL(found_path, CDLL_MODE) + except OSError as e: + raise RuntimeError(f"Failed to dlopen {found_path}: {e}") from e + return LoadedDL(handle._handle, found_path, False) diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py b/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py new file mode 100644 index 000000000..2574b5e6f --- /dev/null +++ b/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py @@ -0,0 +1,141 @@ +# Copyright 2025 NVIDIA Corporation. All rights reserved. +# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE + +import ctypes +import ctypes.wintypes +import functools +from typing import Optional + +import pywintypes +import win32api + +from .load_dl_common import LoadedDL, add_dll_directory + +# Mirrors WinBase.h (unfortunately not defined already elsewhere) +WINBASE_LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR = 0x00000100 +WINBASE_LOAD_LIBRARY_SEARCH_DEFAULT_DIRS = 0x00001000 + + +def abs_path_for_dynamic_library(handle: int) -> str: + """Get the absolute path of a loaded dynamic library on Windows. + + Args: + handle: The library handle + + Returns: + The absolute path to the DLL file + + Raises: + OSError: If GetModuleFileNameW fails + """ + buf = ctypes.create_unicode_buffer(260) + n_chars = ctypes.windll.kernel32.GetModuleFileNameW(ctypes.wintypes.HMODULE(handle), buf, len(buf)) + if n_chars == 0: + raise OSError("GetModuleFileNameW failed") + return buf.value + + +@functools.cache +def cuDriverGetVersion() -> int: + """Get the CUDA driver version. + + Returns: + The CUDA driver version number + + Raises: + AssertionError: If the driver version cannot be obtained + """ + handle = win32api.LoadLibrary("nvcuda.dll") + + kernel32 = ctypes.WinDLL("kernel32", use_last_error=True) + GetProcAddress = kernel32.GetProcAddress + GetProcAddress.argtypes = [ctypes.wintypes.HMODULE, ctypes.wintypes.LPCSTR] + GetProcAddress.restype = ctypes.c_void_p + cuDriverGetVersion = GetProcAddress(handle, b"cuDriverGetVersion") + assert cuDriverGetVersion + + FUNC_TYPE = ctypes.CFUNCTYPE(ctypes.c_int, ctypes.POINTER(ctypes.c_int)) + cuDriverGetVersion_fn = FUNC_TYPE(cuDriverGetVersion) + driver_ver = ctypes.c_int() + err = cuDriverGetVersion_fn(ctypes.byref(driver_ver)) + assert err == 0 + return driver_ver.value + + +def check_if_already_loaded(libname: str) -> Optional[LoadedDL]: + """Check if the library is already loaded in the process. + + Args: + libname: The name of the library to check + + Returns: + A LoadedDL object if the library is already loaded, None otherwise + + Example: + >>> loaded = check_if_already_loaded("cudart") + >>> if loaded is not None: + ... print(f"Library already loaded from {loaded.abs_path}") + """ + from .supported_libs import SUPPORTED_WINDOWS_DLLS + + for dll_name in SUPPORTED_WINDOWS_DLLS.get(libname, ()): + try: + handle = win32api.GetModuleHandle(dll_name) + except pywintypes.error: + continue + else: + return LoadedDL(handle, abs_path_for_dynamic_library(handle), True) + return None + + +def load_with_system_search(name: str, _unused: str) -> Optional[LoadedDL]: + """Try to load a DLL using system search paths. + + Args: + name: The name of the library to load + _unused: Unused parameter (kept for interface consistency) + + Returns: + A LoadedDL object if successful, None if the library cannot be loaded + """ + from .supported_libs import SUPPORTED_WINDOWS_DLLS + + driver_ver = cuDriverGetVersion() + del driver_ver # Keeping this here because it will probably be needed in the future. + + dll_names = SUPPORTED_WINDOWS_DLLS.get(name) + if dll_names is None: + return None + + for dll_name in dll_names: + handle = ctypes.windll.kernel32.LoadLibraryW(ctypes.c_wchar_p(dll_name)) + if handle: + return LoadedDL(handle, abs_path_for_dynamic_library(handle), False) + + return None + + +def load_with_abs_path(libname: str, found_path: str) -> LoadedDL: + """Load a dynamic library from the given path. + + Args: + libname: The name of the library to load + found_path: The absolute path to the DLL file + + Returns: + A LoadedDL object representing the loaded library + + Raises: + RuntimeError: If the DLL cannot be loaded + """ + from .supported_libs import LIBNAMES_REQUIRING_OS_ADD_DLL_DIRECTORY + + if libname in LIBNAMES_REQUIRING_OS_ADD_DLL_DIRECTORY: + add_dll_directory(found_path) + + flags = WINBASE_LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | WINBASE_LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR + try: + handle = win32api.LoadLibraryEx(found_path, 0, flags) + except pywintypes.error as e: + raise RuntimeError(f"Failed to load DLL at {found_path}: {e}") from e + return LoadedDL(handle, found_path, False) diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py b/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py index c770de67d..48eaee4aa 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py +++ b/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py @@ -1,193 +1,50 @@ # Copyright 2025 NVIDIA Corporation. All rights reserved. -# # SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE -import ctypes import functools -import os import sys -from dataclasses import dataclass -from typing import Optional, Tuple - -if sys.platform == "win32": - import ctypes.wintypes - - import pywintypes - import win32api - - # Mirrors WinBase.h (unfortunately not defined already elsewhere) - _WINBASE_LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR = 0x00000100 - _WINBASE_LOAD_LIBRARY_SEARCH_DEFAULT_DIRS = 0x00001000 - -else: - import ctypes.util - - _LINUX_CDLL_MODE = os.RTLD_NOW | os.RTLD_GLOBAL - - _LIBDL_PATH = ctypes.util.find_library("dl") or "libdl.so.2" - _LIBDL = ctypes.CDLL(_LIBDL_PATH) - _LIBDL.dladdr.argtypes = [ctypes.c_void_p, ctypes.c_void_p] - _LIBDL.dladdr.restype = ctypes.c_int - - class Dl_info(ctypes.Structure): - _fields_ = [ - ("dli_fname", ctypes.c_char_p), # path to .so - ("dli_fbase", ctypes.c_void_p), - ("dli_sname", ctypes.c_char_p), - ("dli_saddr", ctypes.c_void_p), - ] - from .find_nvidia_dynamic_library import _find_nvidia_dynamic_library -from .supported_libs import ( - DIRECT_DEPENDENCIES, - EXPECTED_LIB_SYMBOLS, - LIBNAMES_REQUIRING_OS_ADD_DLL_DIRECTORY, - SUPPORTED_LINUX_SONAMES, - SUPPORTED_WINDOWS_DLLS, -) - - -def _add_dll_directory(dll_abs_path): - dirpath = os.path.dirname(dll_abs_path) - assert os.path.isdir(dirpath), dll_abs_path - # Add the DLL directory to the search path - os.add_dll_directory(dirpath) - # Update PATH as a fallback for dependent DLL resolution - curr_path = os.environ.get("PATH") - os.environ["PATH"] = dirpath if curr_path is None else os.pathsep.join((curr_path, dirpath)) - - -@functools.cache -def _windows_cuDriverGetVersion() -> int: - handle = win32api.LoadLibrary("nvcuda.dll") - - kernel32 = ctypes.WinDLL("kernel32", use_last_error=True) - GetProcAddress = kernel32.GetProcAddress - GetProcAddress.argtypes = [ctypes.wintypes.HMODULE, ctypes.wintypes.LPCSTR] - GetProcAddress.restype = ctypes.c_void_p - cuDriverGetVersion = GetProcAddress(handle, b"cuDriverGetVersion") - assert cuDriverGetVersion - - FUNC_TYPE = ctypes.CFUNCTYPE(ctypes.c_int, ctypes.POINTER(ctypes.c_int)) - cuDriverGetVersion_fn = FUNC_TYPE(cuDriverGetVersion) - driver_ver = ctypes.c_int() - err = cuDriverGetVersion_fn(ctypes.byref(driver_ver)) - assert err == 0 - return driver_ver.value - +from .load_dl_common import LoadedDL, load_dependencies -def _abs_path_for_dynamic_library_windows(handle: int) -> str: - buf = ctypes.create_unicode_buffer(260) - n_chars = ctypes.windll.kernel32.GetModuleFileNameW(ctypes.wintypes.HMODULE(handle), buf, len(buf)) - if n_chars == 0: - raise OSError("GetModuleFileNameW failed") - return buf.value - - -@functools.cache -def _windows_load_with_dll_basename(name: str) -> Tuple[Optional[int], Optional[str]]: - driver_ver = _windows_cuDriverGetVersion() - del driver_ver # Keeping this here because it will probably be needed in the future. - - dll_names = SUPPORTED_WINDOWS_DLLS.get(name) - if dll_names is None: - return None - - for dll_name in dll_names: - handle = ctypes.windll.kernel32.LoadLibraryW(ctypes.c_wchar_p(dll_name)) - if handle: - return handle, _abs_path_for_dynamic_library_windows(handle) - - return None, None - - -def _abs_path_for_dynamic_library_linux(libname: str, handle: int) -> str: - for symbol_name in EXPECTED_LIB_SYMBOLS[libname]: - symbol = getattr(handle, symbol_name, None) - if symbol is not None: - break - else: - return None - addr = ctypes.cast(symbol, ctypes.c_void_p) - info = Dl_info() - if _LIBDL.dladdr(addr, ctypes.byref(info)) == 0: - raise OSError(f"dladdr failed for {libname=!r}") - return info.dli_fname.decode() - - -def _load_and_report_path_linux(libname: str, soname: str) -> Tuple[int, str]: - handle = ctypes.CDLL(soname, _LINUX_CDLL_MODE) - abs_path = _abs_path_for_dynamic_library_linux(libname, handle) - if abs_path is None: - raise RuntimeError(f"No expected symbol for {libname=!r}") - return handle, abs_path - - -@dataclass -class LoadedDL: - # ATTENTION: To convert `handle` back to `void*` in cython: - # Linux: `cdef void* ptr = ` - # Windows: `cdef void* ptr = ` - handle: int - abs_path: Optional[str] - was_already_loaded_from_elsewhere: bool +if sys.platform == "win32": + from .load_dl_windows import check_if_already_loaded, load_with_abs_path, load_with_system_search +else: + from .load_dl_linux import check_if_already_loaded, load_with_abs_path, load_with_system_search def _load_nvidia_dynamic_library_no_cache(libname: str) -> LoadedDL: - # Detect if the library was loaded already in some other way (i.e. not via this function). - if sys.platform == "win32": - for dll_name in SUPPORTED_WINDOWS_DLLS.get(libname, ()): - try: - handle = win32api.GetModuleHandle(dll_name) - except pywintypes.error: - pass - else: - return LoadedDL(handle, _abs_path_for_dynamic_library_windows(handle), True) - else: - for soname in SUPPORTED_LINUX_SONAMES.get(libname, ()): - try: - handle = ctypes.CDLL(soname, mode=os.RTLD_NOLOAD) - except OSError: - pass - else: - return LoadedDL(handle, _abs_path_for_dynamic_library_linux(libname, handle), True) + # Check if library is already loaded + loaded = check_if_already_loaded(libname) + if loaded is not None: + return loaded - for dep in DIRECT_DEPENDENCIES.get(libname, ()): - load_nvidia_dynamic_library(dep) + # Load dependencies first + load_dependencies(libname, load_nvidia_dynamic_library) + # Find the library path found = _find_nvidia_dynamic_library(libname) if found.abs_path is None: - if sys.platform == "win32": - handle, abs_path = _windows_load_with_dll_basename(libname) - if handle: - return LoadedDL(handle, abs_path, False) - else: - try: - handle, abs_path = _load_and_report_path_linux(libname, found.lib_searched_for) - except OSError: - pass - else: - return LoadedDL(handle._handle, abs_path, False) + loaded = load_with_system_search(libname, found.lib_searched_for) + if loaded is not None: + return loaded found.raise_if_abs_path_is_None() - if sys.platform == "win32": - if libname in LIBNAMES_REQUIRING_OS_ADD_DLL_DIRECTORY: - _add_dll_directory(found.abs_path) - flags = _WINBASE_LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | _WINBASE_LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR - try: - handle = win32api.LoadLibraryEx(found.abs_path, 0, flags) - except pywintypes.error as e: - raise RuntimeError(f"Failed to load DLL at {found.abs_path}: {e}") from e - return LoadedDL(handle, found.abs_path, False) - else: - try: - handle = ctypes.CDLL(found.abs_path, _LINUX_CDLL_MODE) - except OSError as e: - raise RuntimeError(f"Failed to dlopen {found.abs_path}: {e}") from e - return LoadedDL(handle._handle, found.abs_path, False) + # Load the library from the found path + return load_with_abs_path(libname, found.abs_path) @functools.cache def load_nvidia_dynamic_library(libname: str) -> LoadedDL: + """Load a NVIDIA dynamic library by name. + + Args: + libname: The name of the library to load (e.g. "cuda", "cudart", etc.) + + Returns: + A LoadedDL object containing the library handle and path + + Raises: + RuntimeError: If the library cannot be found or loaded + """ return _load_nvidia_dynamic_library_no_cache(libname) From bc0137af4ab8df29e42fff13d0ba2310224e2b56 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Fri, 25 Apr 2025 16:51:43 -0700 Subject: [PATCH 10/52] Fix an auto-generated docstring --- .../cuda/bindings/_path_finder/load_nvidia_dynamic_library.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py b/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py index 48eaee4aa..c7624bcb0 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py +++ b/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py @@ -39,7 +39,7 @@ def load_nvidia_dynamic_library(libname: str) -> LoadedDL: """Load a NVIDIA dynamic library by name. Args: - libname: The name of the library to load (e.g. "cuda", "cudart", etc.) + libname: The name of the library to load (e.g. "cudart", "nvvm", etc.) Returns: A LoadedDL object containing the library handle and path From 001a6a23bcb6e2e458acdf8d85d134068965d352 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Fri, 25 Apr 2025 20:50:20 -0700 Subject: [PATCH 11/52] first round of Cursor refactoring (about 4 iterations until all tests passed), followed by ruff auto-fixes --- .../bindings/_path_finder/find_dl_common.py | 79 +++++++ .../bindings/_path_finder/find_dl_linux.py | 116 ++++++++++ .../bindings/_path_finder/find_dl_windows.py | 90 ++++++++ .../find_nvidia_dynamic_library.py | 213 +++++++----------- .../load_nvidia_dynamic_library.py | 4 +- 5 files changed, 364 insertions(+), 138 deletions(-) create mode 100644 cuda_bindings/cuda/bindings/_path_finder/find_dl_common.py create mode 100644 cuda_bindings/cuda/bindings/_path_finder/find_dl_linux.py create mode 100644 cuda_bindings/cuda/bindings/_path_finder/find_dl_windows.py diff --git a/cuda_bindings/cuda/bindings/_path_finder/find_dl_common.py b/cuda_bindings/cuda/bindings/_path_finder/find_dl_common.py new file mode 100644 index 000000000..abc3ff38e --- /dev/null +++ b/cuda_bindings/cuda/bindings/_path_finder/find_dl_common.py @@ -0,0 +1,79 @@ +# Copyright 2024-2025 NVIDIA Corporation. All rights reserved. +# +# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE + +import os +from typing import List, Optional + +from .cuda_paths import get_cuda_paths +from .sys_path_find_sub_dirs import sys_path_find_sub_dirs + + +def no_such_file_in_sub_dirs( + sub_dirs: tuple[str, ...], file_wild: str, error_messages: List[str], attachments: List[str] +) -> None: + """Report that a file was not found in the given subdirectories. + + Args: + sub_dirs: Tuple of subdirectory names to search + file_wild: The file pattern to search for + error_messages: List to append error messages to + attachments: List to append directory listings to + """ + error_messages.append(f"No such file: {file_wild}") + for sub_dir in sys_path_find_sub_dirs(sub_dirs): + attachments.append(f' listdir("{sub_dir}"):') + for node in sorted(os.listdir(sub_dir)): + attachments.append(f" {node}") + + +def get_cuda_paths_info(key: str, error_messages: List[str]) -> Optional[str]: + """Get information from cuda_paths for a given key. + + Args: + key: The key to look up in cuda_paths + error_messages: List to append error messages to + + Returns: + The path info if found, None otherwise + """ + env_path_tuple = get_cuda_paths()[key] + if not env_path_tuple: + error_messages.append(f'Failure obtaining get_cuda_paths()["{key}"]') + return None + if not env_path_tuple.info: + error_messages.append(f'Failure obtaining get_cuda_paths()["{key}"].info') + return None + return env_path_tuple.info + + +class FindResult: + """Result of a library search operation. + + Attributes: + abs_path: The absolute path to the found library, or None if not found + error_messages: List of error messages encountered during the search + attachments: List of additional information (e.g. directory listings) + lib_searched_for: The library name that was searched for + """ + + def __init__(self, lib_searched_for: str): + self.abs_path: Optional[str] = None + self.error_messages: List[str] = [] + self.attachments: List[str] = [] + self.lib_searched_for = lib_searched_for + + def raise_if_abs_path_is_None(self) -> str: + """Raise an error if no library was found. + + Returns: + The absolute path to the found library + + Raises: + RuntimeError: If no library was found + """ + if self.abs_path: + return self.abs_path + err = ", ".join(self.error_messages) + att = "\n".join(self.attachments) + raise RuntimeError(f'Failure finding "{self.lib_searched_for}": {err}\n{att}') diff --git a/cuda_bindings/cuda/bindings/_path_finder/find_dl_linux.py b/cuda_bindings/cuda/bindings/_path_finder/find_dl_linux.py new file mode 100644 index 000000000..a44a690fc --- /dev/null +++ b/cuda_bindings/cuda/bindings/_path_finder/find_dl_linux.py @@ -0,0 +1,116 @@ +# Copyright 2024-2025 NVIDIA Corporation. All rights reserved. +# +# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE + +import os + +from .find_dl_common import FindResult, get_cuda_paths_info, no_such_file_in_sub_dirs +from .sys_path_find_sub_dirs import sys_path_find_sub_dirs + + +def find_so_using_nvidia_lib_dirs(lib_searched_for: str) -> FindResult: + """Find a .so file using NVIDIA library directories. + + Args: + lib_searched_for: The library name to search for + + Returns: + FindResult containing the search results + """ + result = FindResult(lib_searched_for) + file_wild = f"lib{lib_searched_for}.so*" + sub_dirs = ("lib", "lib64") + + for sub_dir in sys_path_find_sub_dirs(sub_dirs): + for node in sorted(os.listdir(sub_dir)): + if node.startswith(f"lib{lib_searched_for}.so"): + result.abs_path = os.path.join(sub_dir, node) + return result + + no_such_file_in_sub_dirs(sub_dirs, file_wild, result.error_messages, result.attachments) + return result + + +def find_so_using_cudalib_dir(lib_searched_for: str) -> FindResult: + """Find a .so file using the CUDA library directory. + + Args: + lib_searched_for: The library name to search for + + Returns: + FindResult containing the search results + """ + result = FindResult(lib_searched_for) + cudalib_dir = get_cuda_paths_info("cudalib_dir", result.error_messages) + if not cudalib_dir: + return result + + file_wild = f"lib{lib_searched_for}.so*" + for node in sorted(os.listdir(cudalib_dir)): + if node.startswith(f"lib{lib_searched_for}.so"): + result.abs_path = os.path.join(cudalib_dir, node) + return result + + result.error_messages.append(f"No such file: {file_wild}") + result.attachments.append(f' listdir("{cudalib_dir}"):') + for node in sorted(os.listdir(cudalib_dir)): + result.attachments.append(f" {node}") + return result + + +def find_so_using_cuda_path(lib_searched_for: str) -> FindResult: + """Find a .so file using the CUDA path. + + Args: + lib_searched_for: The library name to search for + + Returns: + FindResult containing the search results + """ + result = FindResult(lib_searched_for) + cuda_path = get_cuda_paths_info("cuda_path", result.error_messages) + if not cuda_path: + return result + + file_wild = f"lib{lib_searched_for}.so*" + for sub_dir in ("lib", "lib64"): + path = os.path.join(cuda_path, sub_dir) + if not os.path.isdir(path): + continue + for node in sorted(os.listdir(path)): + if node.startswith(f"lib{lib_searched_for}.so"): + result.abs_path = os.path.join(path, node) + return result + + result.error_messages.append(f"No such file: {file_wild}") + for sub_dir in ("lib", "lib64"): + path = os.path.join(cuda_path, sub_dir) + if os.path.isdir(path): + result.attachments.append(f' listdir("{path}"):') + for node in sorted(os.listdir(path)): + result.attachments.append(f" {node}") + return result + + +def find_nvidia_dynamic_library(lib_searched_for: str) -> FindResult: + """Find a NVIDIA dynamic library on Linux. + + Args: + lib_searched_for: The library name to search for + + Returns: + FindResult containing the search results + """ + # Try NVIDIA library directories first + result = find_so_using_nvidia_lib_dirs(lib_searched_for) + if result.abs_path: + return result + + # Then try CUDA library directory + result = find_so_using_cudalib_dir(lib_searched_for) + if result.abs_path: + return result + + # Finally try CUDA path + result = find_so_using_cuda_path(lib_searched_for) + return result diff --git a/cuda_bindings/cuda/bindings/_path_finder/find_dl_windows.py b/cuda_bindings/cuda/bindings/_path_finder/find_dl_windows.py new file mode 100644 index 000000000..840835961 --- /dev/null +++ b/cuda_bindings/cuda/bindings/_path_finder/find_dl_windows.py @@ -0,0 +1,90 @@ +# Copyright 2024-2025 NVIDIA Corporation. All rights reserved. +# +# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE + +import os + +from .find_dl_common import FindResult, get_cuda_paths_info + + +def find_dll_under_dir(lib_searched_for: str, dir_path: str) -> FindResult: + """Find a .dll file under a specific directory. + + Args: + lib_searched_for: The library name to search for + dir_path: The directory to search in + + Returns: + FindResult containing the search results + """ + result = FindResult(lib_searched_for) + file_wild = f"{lib_searched_for}.dll" + + if not os.path.isdir(dir_path): + result.error_messages.append(f"No such directory: {dir_path}") + return result + + for node in sorted(os.listdir(dir_path)): + if node.lower() == file_wild.lower(): + result.abs_path = os.path.join(dir_path, node) + return result + + result.error_messages.append(f"No such file: {file_wild}") + result.attachments.append(f' listdir("{dir_path}"):') + for node in sorted(os.listdir(dir_path)): + result.attachments.append(f" {node}") + return result + + +def find_dll_using_cudalib_dir(lib_searched_for: str) -> FindResult: + """Find a .dll file using the CUDA library directory. + + Args: + lib_searched_for: The library name to search for + + Returns: + FindResult containing the search results + """ + result = FindResult(lib_searched_for) + cudalib_dir = get_cuda_paths_info("cudalib_dir", result.error_messages) + if not cudalib_dir: + return result + + return find_dll_under_dir(lib_searched_for, cudalib_dir) + + +def find_dll_using_cuda_path(lib_searched_for: str) -> FindResult: + """Find a .dll file using the CUDA path. + + Args: + lib_searched_for: The library name to search for + + Returns: + FindResult containing the search results + """ + result = FindResult(lib_searched_for) + cuda_path = get_cuda_paths_info("cuda_path", result.error_messages) + if not cuda_path: + return result + + bin_path = os.path.join(cuda_path, "bin") + return find_dll_under_dir(lib_searched_for, bin_path) + + +def find_nvidia_dynamic_library(lib_searched_for: str) -> FindResult: + """Find a NVIDIA dynamic library on Windows. + + Args: + lib_searched_for: The library name to search for + + Returns: + FindResult containing the search results + """ + # Try CUDA library directory first + result = find_dll_using_cudalib_dir(lib_searched_for) + if result.abs_path: + return result + + # Then try CUDA path + result = find_dll_using_cuda_path(lib_searched_for) + return result diff --git a/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py b/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py index e60154aa5..2449a9d10 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py +++ b/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py @@ -3,147 +3,60 @@ # SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE import functools -import glob -import os - -from .cuda_paths import IS_WIN32, get_cuda_paths -from .supported_libs import is_suppressed_dll_file -from .sys_path_find_sub_dirs import sys_path_find_sub_dirs - - -def _no_such_file_in_sub_dirs(sub_dirs, file_wild, error_messages, attachments): - error_messages.append(f"No such file: {file_wild}") - for sub_dir in sys_path_find_sub_dirs(sub_dirs): - attachments.append(f' listdir("{sub_dir}"):') - for node in sorted(os.listdir(sub_dir)): - attachments.append(f" {node}") - - -def _find_so_using_nvidia_lib_dirs(libname, so_basename, error_messages, attachments): - if libname == "nvvm": # noqa: SIM108 - nvidia_sub_dirs = ("nvidia", "*", "nvvm", "lib64") - else: - nvidia_sub_dirs = ("nvidia", "*", "lib") - file_wild = so_basename + "*" - for lib_dir in sys_path_find_sub_dirs(nvidia_sub_dirs): - # First look for an exact match - so_name = os.path.join(lib_dir, so_basename) - if os.path.isfile(so_name): - return so_name - # Look for a versioned library - # Using sort here mainly to make the result deterministic. - for so_name in sorted(glob.glob(os.path.join(lib_dir, file_wild))): - if os.path.isfile(so_name): - return so_name - _no_such_file_in_sub_dirs(nvidia_sub_dirs, file_wild, error_messages, attachments) - return None - - -def _find_dll_under_dir(dirpath, file_wild): - for path in sorted(glob.glob(os.path.join(dirpath, file_wild))): - if not os.path.isfile(path): - continue - if not is_suppressed_dll_file(os.path.basename(path)): - return path - return None - - -def _find_dll_using_nvidia_bin_dirs(libname, error_messages, attachments): - if libname == "nvvm": # noqa: SIM108 - nvidia_sub_dirs = ("nvidia", "*", "nvvm", "bin") - else: - nvidia_sub_dirs = ("nvidia", "*", "bin") - file_wild = libname + "*.dll" - for bin_dir in sys_path_find_sub_dirs(nvidia_sub_dirs): - dll_name = _find_dll_under_dir(bin_dir, file_wild) - if dll_name is not None: - return dll_name - _no_such_file_in_sub_dirs(nvidia_sub_dirs, file_wild, error_messages, attachments) - return None - - -def _get_cuda_paths_info(key, error_messages): - env_path_tuple = get_cuda_paths()[key] - if not env_path_tuple: - error_messages.append(f'Failure obtaining get_cuda_paths()["{key}"]') - return None - if not env_path_tuple.info: - error_messages.append(f'Failure obtaining get_cuda_paths()["{key}"].info') - return None - return env_path_tuple.info - - -def _find_so_using_cudalib_dir(so_basename, error_messages, attachments): - cudalib_dir = _get_cuda_paths_info("cudalib_dir", error_messages) - if cudalib_dir is None: - return None - primary_so_dir = cudalib_dir + "/" - candidate_so_dirs = [primary_so_dir] - libs = ["/lib/", "/lib64/"] - for _ in range(2): - alt_dir = libs[0].join(primary_so_dir.rsplit(libs[1], 1)) - if alt_dir not in candidate_so_dirs: - candidate_so_dirs.append(alt_dir) - libs.reverse() - candidate_so_names = [so_dirname + so_basename for so_dirname in candidate_so_dirs] - for so_name in candidate_so_names: - if os.path.isfile(so_name): - return so_name - error_messages.append(f"No such file: {so_name}") - for so_dirname in candidate_so_dirs: - attachments.append(f' listdir("{so_dirname}"):') - if not os.path.isdir(so_dirname): - attachments.append(" DIRECTORY DOES NOT EXIST") - else: - for node in sorted(os.listdir(so_dirname)): - attachments.append(f" {node}") - return None - - -def _find_dll_using_cudalib_dir(libname, error_messages, attachments): - cudalib_dir = _get_cuda_paths_info("cudalib_dir", error_messages) - if cudalib_dir is None: - return None - file_wild = libname + "*.dll" - dll_name = _find_dll_under_dir(cudalib_dir, file_wild) - if dll_name is not None: - return dll_name - error_messages.append(f"No such file: {file_wild}") - attachments.append(f' listdir("{cudalib_dir}"):') - for node in sorted(os.listdir(cudalib_dir)): - attachments.append(f" {node}") - return None - - -class _find_nvidia_dynamic_library: +import sys +from typing import Dict + +from .cuda_paths import get_cuda_paths +from .find_dl_linux import find_nvidia_dynamic_library as find_nvidia_dynamic_library_linux +from .find_dl_windows import find_nvidia_dynamic_library as find_nvidia_dynamic_library_windows + + +class FindNvidiaDynamicLibrary: + """Class for finding NVIDIA dynamic libraries. + + This class maintains the same interface as the original _find_nvidia_dynamic_library + class for backward compatibility. + """ + def __init__(self, libname: str): + """Initialize the finder with a library name. + + Args: + libname: The name of the library to find + """ self.libname = libname self.error_messages = [] self.attachments = [] self.abs_path = None - - if IS_WIN32: - self.abs_path = _find_dll_using_nvidia_bin_dirs(libname, self.error_messages, self.attachments) - if self.abs_path is None: - if libname == "nvvm": - self.abs_path = _get_cuda_paths_info("nvvm", self.error_messages) - else: - self.abs_path = _find_dll_using_cudalib_dir(libname, self.error_messages, self.attachments) - self.lib_searched_for = f"{libname}*.dll" + self.lib_searched_for = f"lib{libname}.so" if sys.platform != "win32" else f"{libname}.dll" + + # Special case for nvvm + if libname == "nvvm": + nvvm_path = get_cuda_paths()["nvvm"] + if nvvm_path and nvvm_path.info: + self.abs_path = nvvm_path.info + return + + if sys.platform == "linux": + result = find_nvidia_dynamic_library_linux(libname) + elif sys.platform == "win32": + result = find_nvidia_dynamic_library_windows(libname) else: - self.lib_searched_for = f"lib{libname}.so" - self.abs_path = _find_so_using_nvidia_lib_dirs( - libname, self.lib_searched_for, self.error_messages, self.attachments - ) - if self.abs_path is None: - if libname == "nvvm": - self.abs_path = _get_cuda_paths_info("nvvm", self.error_messages) - else: - self.abs_path = _find_so_using_cudalib_dir( - self.lib_searched_for, self.error_messages, self.attachments - ) - - def raise_if_abs_path_is_None(self): + raise NotImplementedError(f"Platform {sys.platform} is not supported") + + self.abs_path = result.abs_path + self.error_messages = result.error_messages + self.attachments = result.attachments + + def raise_if_abs_path_is_None(self) -> str: + """Raise an error if no library was found. + + Returns: + The absolute path to the found library + + Raises: + RuntimeError: If no library was found + """ if self.abs_path: return self.abs_path err = ", ".join(self.error_messages) @@ -151,6 +64,34 @@ def raise_if_abs_path_is_None(self): raise RuntimeError(f'Failure finding "{self.lib_searched_for}": {err}\n{att}') +# Cache for found libraries +_found_libraries: Dict[str, str] = {} + + @functools.cache def find_nvidia_dynamic_library(libname: str) -> str: - return _find_nvidia_dynamic_library(libname).raise_if_abs_path_is_None() + """Find a NVIDIA dynamic library. + + This function will cache the results of successful lookups to avoid repeated searches. + + Args: + libname: The library name to search for (e.g. "cudart", "nvvm") + + Returns: + The absolute path to the found library + + Raises: + RuntimeError: If the library cannot be found + NotImplementedError: If the current platform is not supported + """ + # Check cache first + if libname in _found_libraries: + return _found_libraries[libname] + + # Use the class-based approach for backward compatibility + finder = FindNvidiaDynamicLibrary(libname) + result = finder.raise_if_abs_path_is_None() + + # Cache the result + _found_libraries[libname] = result + return result diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py b/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py index c7624bcb0..c12a11e97 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py +++ b/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py @@ -4,7 +4,7 @@ import functools import sys -from .find_nvidia_dynamic_library import _find_nvidia_dynamic_library +from .find_nvidia_dynamic_library import FindNvidiaDynamicLibrary from .load_dl_common import LoadedDL, load_dependencies if sys.platform == "win32": @@ -23,7 +23,7 @@ def _load_nvidia_dynamic_library_no_cache(libname: str) -> LoadedDL: load_dependencies(libname, load_nvidia_dynamic_library) # Find the library path - found = _find_nvidia_dynamic_library(libname) + found = FindNvidiaDynamicLibrary(libname) if found.abs_path is None: loaded = load_with_system_search(libname, found.lib_searched_for) if loaded is not None: From 9721079c3a57a201792d3045876f8bf4658f3d98 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Fri, 25 Apr 2025 22:13:43 -0700 Subject: [PATCH 12/52] Revert "first round of Cursor refactoring (about 4 iterations until all tests passed), followed by ruff auto-fixes" This reverts commit 001a6a23bcb6e2e458acdf8d85d134068965d352. There were many GitHub Actions jobs that failed (all tests with 12.x): https://github.com/NVIDIA/cuda-python/actions/runs/14677553387 This is not worth spending time debugging. Especially because * Cursor has been unresponsive for at least half an hour: We're having trouble connecting to the model provider. This might be temporary - please try again in a moment. * The refactored code does not seem easier to read. --- .../bindings/_path_finder/find_dl_common.py | 79 ------- .../bindings/_path_finder/find_dl_linux.py | 116 ---------- .../bindings/_path_finder/find_dl_windows.py | 90 -------- .../find_nvidia_dynamic_library.py | 213 +++++++++++------- .../load_nvidia_dynamic_library.py | 4 +- 5 files changed, 138 insertions(+), 364 deletions(-) delete mode 100644 cuda_bindings/cuda/bindings/_path_finder/find_dl_common.py delete mode 100644 cuda_bindings/cuda/bindings/_path_finder/find_dl_linux.py delete mode 100644 cuda_bindings/cuda/bindings/_path_finder/find_dl_windows.py diff --git a/cuda_bindings/cuda/bindings/_path_finder/find_dl_common.py b/cuda_bindings/cuda/bindings/_path_finder/find_dl_common.py deleted file mode 100644 index abc3ff38e..000000000 --- a/cuda_bindings/cuda/bindings/_path_finder/find_dl_common.py +++ /dev/null @@ -1,79 +0,0 @@ -# Copyright 2024-2025 NVIDIA Corporation. All rights reserved. -# -# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE - -import os -from typing import List, Optional - -from .cuda_paths import get_cuda_paths -from .sys_path_find_sub_dirs import sys_path_find_sub_dirs - - -def no_such_file_in_sub_dirs( - sub_dirs: tuple[str, ...], file_wild: str, error_messages: List[str], attachments: List[str] -) -> None: - """Report that a file was not found in the given subdirectories. - - Args: - sub_dirs: Tuple of subdirectory names to search - file_wild: The file pattern to search for - error_messages: List to append error messages to - attachments: List to append directory listings to - """ - error_messages.append(f"No such file: {file_wild}") - for sub_dir in sys_path_find_sub_dirs(sub_dirs): - attachments.append(f' listdir("{sub_dir}"):') - for node in sorted(os.listdir(sub_dir)): - attachments.append(f" {node}") - - -def get_cuda_paths_info(key: str, error_messages: List[str]) -> Optional[str]: - """Get information from cuda_paths for a given key. - - Args: - key: The key to look up in cuda_paths - error_messages: List to append error messages to - - Returns: - The path info if found, None otherwise - """ - env_path_tuple = get_cuda_paths()[key] - if not env_path_tuple: - error_messages.append(f'Failure obtaining get_cuda_paths()["{key}"]') - return None - if not env_path_tuple.info: - error_messages.append(f'Failure obtaining get_cuda_paths()["{key}"].info') - return None - return env_path_tuple.info - - -class FindResult: - """Result of a library search operation. - - Attributes: - abs_path: The absolute path to the found library, or None if not found - error_messages: List of error messages encountered during the search - attachments: List of additional information (e.g. directory listings) - lib_searched_for: The library name that was searched for - """ - - def __init__(self, lib_searched_for: str): - self.abs_path: Optional[str] = None - self.error_messages: List[str] = [] - self.attachments: List[str] = [] - self.lib_searched_for = lib_searched_for - - def raise_if_abs_path_is_None(self) -> str: - """Raise an error if no library was found. - - Returns: - The absolute path to the found library - - Raises: - RuntimeError: If no library was found - """ - if self.abs_path: - return self.abs_path - err = ", ".join(self.error_messages) - att = "\n".join(self.attachments) - raise RuntimeError(f'Failure finding "{self.lib_searched_for}": {err}\n{att}') diff --git a/cuda_bindings/cuda/bindings/_path_finder/find_dl_linux.py b/cuda_bindings/cuda/bindings/_path_finder/find_dl_linux.py deleted file mode 100644 index a44a690fc..000000000 --- a/cuda_bindings/cuda/bindings/_path_finder/find_dl_linux.py +++ /dev/null @@ -1,116 +0,0 @@ -# Copyright 2024-2025 NVIDIA Corporation. All rights reserved. -# -# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE - -import os - -from .find_dl_common import FindResult, get_cuda_paths_info, no_such_file_in_sub_dirs -from .sys_path_find_sub_dirs import sys_path_find_sub_dirs - - -def find_so_using_nvidia_lib_dirs(lib_searched_for: str) -> FindResult: - """Find a .so file using NVIDIA library directories. - - Args: - lib_searched_for: The library name to search for - - Returns: - FindResult containing the search results - """ - result = FindResult(lib_searched_for) - file_wild = f"lib{lib_searched_for}.so*" - sub_dirs = ("lib", "lib64") - - for sub_dir in sys_path_find_sub_dirs(sub_dirs): - for node in sorted(os.listdir(sub_dir)): - if node.startswith(f"lib{lib_searched_for}.so"): - result.abs_path = os.path.join(sub_dir, node) - return result - - no_such_file_in_sub_dirs(sub_dirs, file_wild, result.error_messages, result.attachments) - return result - - -def find_so_using_cudalib_dir(lib_searched_for: str) -> FindResult: - """Find a .so file using the CUDA library directory. - - Args: - lib_searched_for: The library name to search for - - Returns: - FindResult containing the search results - """ - result = FindResult(lib_searched_for) - cudalib_dir = get_cuda_paths_info("cudalib_dir", result.error_messages) - if not cudalib_dir: - return result - - file_wild = f"lib{lib_searched_for}.so*" - for node in sorted(os.listdir(cudalib_dir)): - if node.startswith(f"lib{lib_searched_for}.so"): - result.abs_path = os.path.join(cudalib_dir, node) - return result - - result.error_messages.append(f"No such file: {file_wild}") - result.attachments.append(f' listdir("{cudalib_dir}"):') - for node in sorted(os.listdir(cudalib_dir)): - result.attachments.append(f" {node}") - return result - - -def find_so_using_cuda_path(lib_searched_for: str) -> FindResult: - """Find a .so file using the CUDA path. - - Args: - lib_searched_for: The library name to search for - - Returns: - FindResult containing the search results - """ - result = FindResult(lib_searched_for) - cuda_path = get_cuda_paths_info("cuda_path", result.error_messages) - if not cuda_path: - return result - - file_wild = f"lib{lib_searched_for}.so*" - for sub_dir in ("lib", "lib64"): - path = os.path.join(cuda_path, sub_dir) - if not os.path.isdir(path): - continue - for node in sorted(os.listdir(path)): - if node.startswith(f"lib{lib_searched_for}.so"): - result.abs_path = os.path.join(path, node) - return result - - result.error_messages.append(f"No such file: {file_wild}") - for sub_dir in ("lib", "lib64"): - path = os.path.join(cuda_path, sub_dir) - if os.path.isdir(path): - result.attachments.append(f' listdir("{path}"):') - for node in sorted(os.listdir(path)): - result.attachments.append(f" {node}") - return result - - -def find_nvidia_dynamic_library(lib_searched_for: str) -> FindResult: - """Find a NVIDIA dynamic library on Linux. - - Args: - lib_searched_for: The library name to search for - - Returns: - FindResult containing the search results - """ - # Try NVIDIA library directories first - result = find_so_using_nvidia_lib_dirs(lib_searched_for) - if result.abs_path: - return result - - # Then try CUDA library directory - result = find_so_using_cudalib_dir(lib_searched_for) - if result.abs_path: - return result - - # Finally try CUDA path - result = find_so_using_cuda_path(lib_searched_for) - return result diff --git a/cuda_bindings/cuda/bindings/_path_finder/find_dl_windows.py b/cuda_bindings/cuda/bindings/_path_finder/find_dl_windows.py deleted file mode 100644 index 840835961..000000000 --- a/cuda_bindings/cuda/bindings/_path_finder/find_dl_windows.py +++ /dev/null @@ -1,90 +0,0 @@ -# Copyright 2024-2025 NVIDIA Corporation. All rights reserved. -# -# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE - -import os - -from .find_dl_common import FindResult, get_cuda_paths_info - - -def find_dll_under_dir(lib_searched_for: str, dir_path: str) -> FindResult: - """Find a .dll file under a specific directory. - - Args: - lib_searched_for: The library name to search for - dir_path: The directory to search in - - Returns: - FindResult containing the search results - """ - result = FindResult(lib_searched_for) - file_wild = f"{lib_searched_for}.dll" - - if not os.path.isdir(dir_path): - result.error_messages.append(f"No such directory: {dir_path}") - return result - - for node in sorted(os.listdir(dir_path)): - if node.lower() == file_wild.lower(): - result.abs_path = os.path.join(dir_path, node) - return result - - result.error_messages.append(f"No such file: {file_wild}") - result.attachments.append(f' listdir("{dir_path}"):') - for node in sorted(os.listdir(dir_path)): - result.attachments.append(f" {node}") - return result - - -def find_dll_using_cudalib_dir(lib_searched_for: str) -> FindResult: - """Find a .dll file using the CUDA library directory. - - Args: - lib_searched_for: The library name to search for - - Returns: - FindResult containing the search results - """ - result = FindResult(lib_searched_for) - cudalib_dir = get_cuda_paths_info("cudalib_dir", result.error_messages) - if not cudalib_dir: - return result - - return find_dll_under_dir(lib_searched_for, cudalib_dir) - - -def find_dll_using_cuda_path(lib_searched_for: str) -> FindResult: - """Find a .dll file using the CUDA path. - - Args: - lib_searched_for: The library name to search for - - Returns: - FindResult containing the search results - """ - result = FindResult(lib_searched_for) - cuda_path = get_cuda_paths_info("cuda_path", result.error_messages) - if not cuda_path: - return result - - bin_path = os.path.join(cuda_path, "bin") - return find_dll_under_dir(lib_searched_for, bin_path) - - -def find_nvidia_dynamic_library(lib_searched_for: str) -> FindResult: - """Find a NVIDIA dynamic library on Windows. - - Args: - lib_searched_for: The library name to search for - - Returns: - FindResult containing the search results - """ - # Try CUDA library directory first - result = find_dll_using_cudalib_dir(lib_searched_for) - if result.abs_path: - return result - - # Then try CUDA path - result = find_dll_using_cuda_path(lib_searched_for) - return result diff --git a/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py b/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py index 2449a9d10..e60154aa5 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py +++ b/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py @@ -3,60 +3,147 @@ # SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE import functools -import sys -from typing import Dict - -from .cuda_paths import get_cuda_paths -from .find_dl_linux import find_nvidia_dynamic_library as find_nvidia_dynamic_library_linux -from .find_dl_windows import find_nvidia_dynamic_library as find_nvidia_dynamic_library_windows - - -class FindNvidiaDynamicLibrary: - """Class for finding NVIDIA dynamic libraries. - - This class maintains the same interface as the original _find_nvidia_dynamic_library - class for backward compatibility. - """ - +import glob +import os + +from .cuda_paths import IS_WIN32, get_cuda_paths +from .supported_libs import is_suppressed_dll_file +from .sys_path_find_sub_dirs import sys_path_find_sub_dirs + + +def _no_such_file_in_sub_dirs(sub_dirs, file_wild, error_messages, attachments): + error_messages.append(f"No such file: {file_wild}") + for sub_dir in sys_path_find_sub_dirs(sub_dirs): + attachments.append(f' listdir("{sub_dir}"):') + for node in sorted(os.listdir(sub_dir)): + attachments.append(f" {node}") + + +def _find_so_using_nvidia_lib_dirs(libname, so_basename, error_messages, attachments): + if libname == "nvvm": # noqa: SIM108 + nvidia_sub_dirs = ("nvidia", "*", "nvvm", "lib64") + else: + nvidia_sub_dirs = ("nvidia", "*", "lib") + file_wild = so_basename + "*" + for lib_dir in sys_path_find_sub_dirs(nvidia_sub_dirs): + # First look for an exact match + so_name = os.path.join(lib_dir, so_basename) + if os.path.isfile(so_name): + return so_name + # Look for a versioned library + # Using sort here mainly to make the result deterministic. + for so_name in sorted(glob.glob(os.path.join(lib_dir, file_wild))): + if os.path.isfile(so_name): + return so_name + _no_such_file_in_sub_dirs(nvidia_sub_dirs, file_wild, error_messages, attachments) + return None + + +def _find_dll_under_dir(dirpath, file_wild): + for path in sorted(glob.glob(os.path.join(dirpath, file_wild))): + if not os.path.isfile(path): + continue + if not is_suppressed_dll_file(os.path.basename(path)): + return path + return None + + +def _find_dll_using_nvidia_bin_dirs(libname, error_messages, attachments): + if libname == "nvvm": # noqa: SIM108 + nvidia_sub_dirs = ("nvidia", "*", "nvvm", "bin") + else: + nvidia_sub_dirs = ("nvidia", "*", "bin") + file_wild = libname + "*.dll" + for bin_dir in sys_path_find_sub_dirs(nvidia_sub_dirs): + dll_name = _find_dll_under_dir(bin_dir, file_wild) + if dll_name is not None: + return dll_name + _no_such_file_in_sub_dirs(nvidia_sub_dirs, file_wild, error_messages, attachments) + return None + + +def _get_cuda_paths_info(key, error_messages): + env_path_tuple = get_cuda_paths()[key] + if not env_path_tuple: + error_messages.append(f'Failure obtaining get_cuda_paths()["{key}"]') + return None + if not env_path_tuple.info: + error_messages.append(f'Failure obtaining get_cuda_paths()["{key}"].info') + return None + return env_path_tuple.info + + +def _find_so_using_cudalib_dir(so_basename, error_messages, attachments): + cudalib_dir = _get_cuda_paths_info("cudalib_dir", error_messages) + if cudalib_dir is None: + return None + primary_so_dir = cudalib_dir + "/" + candidate_so_dirs = [primary_so_dir] + libs = ["/lib/", "/lib64/"] + for _ in range(2): + alt_dir = libs[0].join(primary_so_dir.rsplit(libs[1], 1)) + if alt_dir not in candidate_so_dirs: + candidate_so_dirs.append(alt_dir) + libs.reverse() + candidate_so_names = [so_dirname + so_basename for so_dirname in candidate_so_dirs] + for so_name in candidate_so_names: + if os.path.isfile(so_name): + return so_name + error_messages.append(f"No such file: {so_name}") + for so_dirname in candidate_so_dirs: + attachments.append(f' listdir("{so_dirname}"):') + if not os.path.isdir(so_dirname): + attachments.append(" DIRECTORY DOES NOT EXIST") + else: + for node in sorted(os.listdir(so_dirname)): + attachments.append(f" {node}") + return None + + +def _find_dll_using_cudalib_dir(libname, error_messages, attachments): + cudalib_dir = _get_cuda_paths_info("cudalib_dir", error_messages) + if cudalib_dir is None: + return None + file_wild = libname + "*.dll" + dll_name = _find_dll_under_dir(cudalib_dir, file_wild) + if dll_name is not None: + return dll_name + error_messages.append(f"No such file: {file_wild}") + attachments.append(f' listdir("{cudalib_dir}"):') + for node in sorted(os.listdir(cudalib_dir)): + attachments.append(f" {node}") + return None + + +class _find_nvidia_dynamic_library: def __init__(self, libname: str): - """Initialize the finder with a library name. - - Args: - libname: The name of the library to find - """ self.libname = libname self.error_messages = [] self.attachments = [] self.abs_path = None - self.lib_searched_for = f"lib{libname}.so" if sys.platform != "win32" else f"{libname}.dll" - - # Special case for nvvm - if libname == "nvvm": - nvvm_path = get_cuda_paths()["nvvm"] - if nvvm_path and nvvm_path.info: - self.abs_path = nvvm_path.info - return - - if sys.platform == "linux": - result = find_nvidia_dynamic_library_linux(libname) - elif sys.platform == "win32": - result = find_nvidia_dynamic_library_windows(libname) - else: - raise NotImplementedError(f"Platform {sys.platform} is not supported") - - self.abs_path = result.abs_path - self.error_messages = result.error_messages - self.attachments = result.attachments - - def raise_if_abs_path_is_None(self) -> str: - """Raise an error if no library was found. - Returns: - The absolute path to the found library - - Raises: - RuntimeError: If no library was found - """ + if IS_WIN32: + self.abs_path = _find_dll_using_nvidia_bin_dirs(libname, self.error_messages, self.attachments) + if self.abs_path is None: + if libname == "nvvm": + self.abs_path = _get_cuda_paths_info("nvvm", self.error_messages) + else: + self.abs_path = _find_dll_using_cudalib_dir(libname, self.error_messages, self.attachments) + self.lib_searched_for = f"{libname}*.dll" + else: + self.lib_searched_for = f"lib{libname}.so" + self.abs_path = _find_so_using_nvidia_lib_dirs( + libname, self.lib_searched_for, self.error_messages, self.attachments + ) + if self.abs_path is None: + if libname == "nvvm": + self.abs_path = _get_cuda_paths_info("nvvm", self.error_messages) + else: + self.abs_path = _find_so_using_cudalib_dir( + self.lib_searched_for, self.error_messages, self.attachments + ) + + def raise_if_abs_path_is_None(self): if self.abs_path: return self.abs_path err = ", ".join(self.error_messages) @@ -64,34 +151,6 @@ def raise_if_abs_path_is_None(self) -> str: raise RuntimeError(f'Failure finding "{self.lib_searched_for}": {err}\n{att}') -# Cache for found libraries -_found_libraries: Dict[str, str] = {} - - @functools.cache def find_nvidia_dynamic_library(libname: str) -> str: - """Find a NVIDIA dynamic library. - - This function will cache the results of successful lookups to avoid repeated searches. - - Args: - libname: The library name to search for (e.g. "cudart", "nvvm") - - Returns: - The absolute path to the found library - - Raises: - RuntimeError: If the library cannot be found - NotImplementedError: If the current platform is not supported - """ - # Check cache first - if libname in _found_libraries: - return _found_libraries[libname] - - # Use the class-based approach for backward compatibility - finder = FindNvidiaDynamicLibrary(libname) - result = finder.raise_if_abs_path_is_None() - - # Cache the result - _found_libraries[libname] = result - return result + return _find_nvidia_dynamic_library(libname).raise_if_abs_path_is_None() diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py b/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py index c12a11e97..c7624bcb0 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py +++ b/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py @@ -4,7 +4,7 @@ import functools import sys -from .find_nvidia_dynamic_library import FindNvidiaDynamicLibrary +from .find_nvidia_dynamic_library import _find_nvidia_dynamic_library from .load_dl_common import LoadedDL, load_dependencies if sys.platform == "win32": @@ -23,7 +23,7 @@ def _load_nvidia_dynamic_library_no_cache(libname: str) -> LoadedDL: load_dependencies(libname, load_nvidia_dynamic_library) # Find the library path - found = FindNvidiaDynamicLibrary(libname) + found = _find_nvidia_dynamic_library(libname) if found.abs_path is None: loaded = load_with_system_search(libname, found.lib_searched_for) if loaded is not None: From c409346741328535074073c7c0b53b39f698c601 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Fri, 25 Apr 2025 22:27:22 -0700 Subject: [PATCH 13/52] A couple trivial tweaks --- .../bindings/_path_finder/find_nvidia_dynamic_library.py | 6 +++--- cuda_bindings/cuda/bindings/_path_finder/supported_libs.py | 1 - .../cuda/bindings/_path_finder/sys_path_find_sub_dirs.py | 1 - 3 files changed, 3 insertions(+), 5 deletions(-) diff --git a/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py b/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py index e60154aa5..a0131a9cd 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py +++ b/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py @@ -1,12 +1,12 @@ # Copyright 2024-2025 NVIDIA Corporation. All rights reserved. -# # SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE import functools import glob import os +import sys -from .cuda_paths import IS_WIN32, get_cuda_paths +from .cuda_paths import get_cuda_paths from .supported_libs import is_suppressed_dll_file from .sys_path_find_sub_dirs import sys_path_find_sub_dirs @@ -122,7 +122,7 @@ def __init__(self, libname: str): self.attachments = [] self.abs_path = None - if IS_WIN32: + if sys.platform == "win32": self.abs_path = _find_dll_using_nvidia_bin_dirs(libname, self.error_messages, self.attachments) if self.abs_path is None: if libname == "nvvm": diff --git a/cuda_bindings/cuda/bindings/_path_finder/supported_libs.py b/cuda_bindings/cuda/bindings/_path_finder/supported_libs.py index b0dfcb9e7..bd35c1ee9 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/supported_libs.py +++ b/cuda_bindings/cuda/bindings/_path_finder/supported_libs.py @@ -1,5 +1,4 @@ # Copyright 2025 NVIDIA Corporation. All rights reserved. -# # SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE # THIS FILE NEEDS TO BE REVIEWED/UPDATED FOR EACH CTK RELEASE diff --git a/cuda_bindings/cuda/bindings/_path_finder/sys_path_find_sub_dirs.py b/cuda_bindings/cuda/bindings/_path_finder/sys_path_find_sub_dirs.py index d2da726c9..e632843cc 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/sys_path_find_sub_dirs.py +++ b/cuda_bindings/cuda/bindings/_path_finder/sys_path_find_sub_dirs.py @@ -1,5 +1,4 @@ # Copyright 2024-2025 NVIDIA Corporation. All rights reserved. -# # SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE import functools From b3a3b163b800323f01642fe5c278c48a5d33793d Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Mon, 28 Apr 2025 11:53:26 -0700 Subject: [PATCH 14/52] Prefix the public API (just two items) with underscores for now. --- cuda_bindings/cuda/bindings/path_finder.py | 10 ++++++---- cuda_bindings/tests/test_path_finder.py | 14 +++++++------- toolshed/run_cuda_bindings_path_finder.py | 4 ++-- 3 files changed, 15 insertions(+), 13 deletions(-) diff --git a/cuda_bindings/cuda/bindings/path_finder.py b/cuda_bindings/cuda/bindings/path_finder.py index 9c08bdc25..28badd025 100644 --- a/cuda_bindings/cuda/bindings/path_finder.py +++ b/cuda_bindings/cuda/bindings/path_finder.py @@ -2,10 +2,12 @@ # # SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE -from cuda.bindings._path_finder.load_nvidia_dynamic_library import load_nvidia_dynamic_library -from cuda.bindings._path_finder.supported_libs import SUPPORTED_LIBNAMES +from cuda.bindings._path_finder.load_nvidia_dynamic_library import ( + load_nvidia_dynamic_library as _load_nvidia_dynamic_library, +) +from cuda.bindings._path_finder.supported_libs import SUPPORTED_LIBNAMES as _SUPPORTED_LIBNAMES __all__ = [ - "load_nvidia_dynamic_library", - "SUPPORTED_LIBNAMES", + "_load_nvidia_dynamic_library", + "_SUPPORTED_LIBNAMES", ] diff --git a/cuda_bindings/tests/test_path_finder.py b/cuda_bindings/tests/test_path_finder.py index ea8c1b5d2..d3832bf00 100644 --- a/cuda_bindings/tests/test_path_finder.py +++ b/cuda_bindings/tests/test_path_finder.py @@ -7,16 +7,16 @@ from cuda.bindings import path_finder from cuda.bindings._path_finder import supported_libs -ALL_LIBNAMES = path_finder.SUPPORTED_LIBNAMES + supported_libs.PARTIALLY_SUPPORTED_LIBNAMES_ALL -ALL_LIBNAMES_LINUX = path_finder.SUPPORTED_LIBNAMES + supported_libs.PARTIALLY_SUPPORTED_LIBNAMES_LINUX -ALL_LIBNAMES_WINDOWS = path_finder.SUPPORTED_LIBNAMES + supported_libs.PARTIALLY_SUPPORTED_LIBNAMES_WINDOWS +ALL_LIBNAMES = path_finder._SUPPORTED_LIBNAMES + supported_libs.PARTIALLY_SUPPORTED_LIBNAMES_ALL +ALL_LIBNAMES_LINUX = path_finder._SUPPORTED_LIBNAMES + supported_libs.PARTIALLY_SUPPORTED_LIBNAMES_LINUX +ALL_LIBNAMES_WINDOWS = path_finder._SUPPORTED_LIBNAMES + supported_libs.PARTIALLY_SUPPORTED_LIBNAMES_WINDOWS if os.environ.get("CUDA_BINDINGS_PATH_FINDER_TEST_ALL_LIBNAMES", False): if sys.platform == "win32": TEST_FIND_OR_LOAD_LIBNAMES = ALL_LIBNAMES_WINDOWS else: TEST_FIND_OR_LOAD_LIBNAMES = ALL_LIBNAMES_LINUX else: - TEST_FIND_OR_LOAD_LIBNAMES = path_finder.SUPPORTED_LIBNAMES + TEST_FIND_OR_LOAD_LIBNAMES = path_finder._SUPPORTED_LIBNAMES def test_all_libnames_linux_sonames_consistency(): @@ -54,14 +54,14 @@ def test_find_or_load_nvidia_dynamic_library(info_summary_append, api, libname): """ else: code = f"""\ -from cuda.bindings.path_finder import load_nvidia_dynamic_library +from cuda.bindings.path_finder import _load_nvidia_dynamic_library from cuda.bindings._path_finder.load_nvidia_dynamic_library import _load_nvidia_dynamic_library_no_cache -loaded_dl_fresh = load_nvidia_dynamic_library({libname!r}) +loaded_dl_fresh = _load_nvidia_dynamic_library({libname!r}) if loaded_dl_fresh.was_already_loaded_from_elsewhere: raise RuntimeError("loaded_dl_fresh.was_already_loaded_from_elsewhere") -loaded_dl_from_cache = load_nvidia_dynamic_library({libname!r}) +loaded_dl_from_cache = _load_nvidia_dynamic_library({libname!r}) if loaded_dl_from_cache is not loaded_dl_fresh: raise RuntimeError("loaded_dl_from_cache is not loaded_dl_fresh") diff --git a/toolshed/run_cuda_bindings_path_finder.py b/toolshed/run_cuda_bindings_path_finder.py index 5f47b3990..f47432d05 100644 --- a/toolshed/run_cuda_bindings_path_finder.py +++ b/toolshed/run_cuda_bindings_path_finder.py @@ -5,7 +5,7 @@ from cuda.bindings._path_finder import cuda_paths, supported_libs ALL_LIBNAMES = ( - path_finder.SUPPORTED_LIBNAMES + supported_libs.PARTIALLY_SUPPORTED_LIBNAMES + path_finder._SUPPORTED_LIBNAMES + supported_libs.PARTIALLY_SUPPORTED_LIBNAMES ) @@ -20,7 +20,7 @@ def run(args): for libname in ALL_LIBNAMES: print(f"{libname=}") try: - loaded_dl = path_finder.load_nvidia_dynamic_library(libname) + loaded_dl = path_finder._load_nvidia_dynamic_library(libname) except Exception: print(f"EXCEPTION for {libname=}:") traceback.print_exc(file=sys.stdout) From 180eefd527ffc5f84ee437064772b5c97da11d3d Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Tue, 29 Apr 2025 10:02:33 -0700 Subject: [PATCH 15/52] Add SPDX-License-Identifier to all files under toolshed/ that don't have it already --- toolshed/build_path_finder_dlls.py | 4 ++++ toolshed/build_path_finder_sonames.py | 4 ++++ toolshed/find_sonames.sh | 5 +++++ toolshed/run_cuda_bindings_path_finder.py | 4 ++++ 4 files changed, 17 insertions(+) diff --git a/toolshed/build_path_finder_dlls.py b/toolshed/build_path_finder_dlls.py index c82dcd866..be2db0d1f 100755 --- a/toolshed/build_path_finder_dlls.py +++ b/toolshed/build_path_finder_dlls.py @@ -1,5 +1,9 @@ #!/usr/bin/env python3 +# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED. +# +# SPDX-License-Identifier: Apache-2.0 + # Input for this script: .txt files generated with: # for exe in *.exe; do 7z l $exe > "${exe%.exe}.txt"; done diff --git a/toolshed/build_path_finder_sonames.py b/toolshed/build_path_finder_sonames.py index 20e8ec6c7..17b7dd7b3 100755 --- a/toolshed/build_path_finder_sonames.py +++ b/toolshed/build_path_finder_sonames.py @@ -1,5 +1,9 @@ #!/usr/bin/env python3 +# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED. +# +# SPDX-License-Identifier: Apache-2.0 + # Input for this script: # output of toolshed/find_sonames.sh diff --git a/toolshed/find_sonames.sh b/toolshed/find_sonames.sh index 79c2e89d5..b742becf6 100755 --- a/toolshed/find_sonames.sh +++ b/toolshed/find_sonames.sh @@ -1,4 +1,9 @@ #!/bin/bash + +# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED. +# +# SPDX-License-Identifier: Apache-2.0 + find "$@" -type f -name '*.so*' -print0 | while IFS= read -r -d '' f; do type=$(test -L "$f" && echo SYMLINK || echo FILE) soname=$(readelf -d "$f" 2>/dev/null | awk '/SONAME/ {gsub(/[][]/, "", $5); print $5; exit}') diff --git a/toolshed/run_cuda_bindings_path_finder.py b/toolshed/run_cuda_bindings_path_finder.py index f47432d05..19f43c288 100644 --- a/toolshed/run_cuda_bindings_path_finder.py +++ b/toolshed/run_cuda_bindings_path_finder.py @@ -1,3 +1,7 @@ +# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED. +# +# SPDX-License-Identifier: Apache-2.0 + import sys import traceback From bfc4b69d5d514c6a07ad43202e220c3bc0e3a251 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Tue, 29 Apr 2025 10:07:08 -0700 Subject: [PATCH 16/52] Add SPDX-License-Identifier under cuda_bindings/tests/ --- cuda_bindings/tests/conftest.py | 3 +++ cuda_bindings/tests/test_path_finder.py | 3 +++ cuda_bindings/tests/test_sys_path_find_sub_dirs.py | 3 +++ 3 files changed, 9 insertions(+) diff --git a/cuda_bindings/tests/conftest.py b/cuda_bindings/tests/conftest.py index 65a1d2562..fa6293cc7 100644 --- a/cuda_bindings/tests/conftest.py +++ b/cuda_bindings/tests/conftest.py @@ -1,3 +1,6 @@ +# Copyright 2025 NVIDIA Corporation. All rights reserved. +# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE + import os import pytest diff --git a/cuda_bindings/tests/test_path_finder.py b/cuda_bindings/tests/test_path_finder.py index d3832bf00..172b41942 100644 --- a/cuda_bindings/tests/test_path_finder.py +++ b/cuda_bindings/tests/test_path_finder.py @@ -1,3 +1,6 @@ +# Copyright 2025 NVIDIA Corporation. All rights reserved. +# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE + import os import subprocess # nosec B404 import sys diff --git a/cuda_bindings/tests/test_sys_path_find_sub_dirs.py b/cuda_bindings/tests/test_sys_path_find_sub_dirs.py index 3297ce39e..cab9e31d7 100644 --- a/cuda_bindings/tests/test_sys_path_find_sub_dirs.py +++ b/cuda_bindings/tests/test_sys_path_find_sub_dirs.py @@ -1,3 +1,6 @@ +# Copyright 2025 NVIDIA Corporation. All rights reserved. +# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE + import os import pytest From a7001e19e447ba111d6ea56215b6febf8ab8abac Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Tue, 29 Apr 2025 10:20:24 -0700 Subject: [PATCH 17/52] Respond to "Do these need to be run as subprocesses?" review question (https://github.com/NVIDIA/cuda-python/pull/578#discussion_r2064470913) --- cuda_bindings/tests/test_path_finder.py | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/cuda_bindings/tests/test_path_finder.py b/cuda_bindings/tests/test_path_finder.py index 172b41942..65aa86f4f 100644 --- a/cuda_bindings/tests/test_path_finder.py +++ b/cuda_bindings/tests/test_path_finder.py @@ -49,6 +49,13 @@ def _build_subprocess_failed_for_libname_message(libname, result): @pytest.mark.parametrize("api", ("find", "load")) @pytest.mark.parametrize("libname", TEST_FIND_OR_LOAD_LIBNAMES) def test_find_or_load_nvidia_dynamic_library(info_summary_append, api, libname): + # We intentionally run each dynamic library operation in a subprocess + # to ensure isolation of global dynamic linking state (e.g., dlopen handles). + # Without subprocesses, loading/unloading libraries during testing could + # interfere across test cases and lead to nondeterministic or platform-specific failures. + # + # Defining the subprocess code snippets as strings ensures each subprocess + # runs a minimal, independent script tailored to the specific libname and API being tested. if api == "find": code = f"""\ from cuda.bindings._path_finder.find_nvidia_dynamic_library import find_nvidia_dynamic_library From 4d95eb4dc49e47cc79025fd6dfaedcb5d558a25c Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Tue, 29 Apr 2025 10:32:51 -0700 Subject: [PATCH 18/52] Respond to "dead code?" review questions (e.g. https://github.com/NVIDIA/cuda-python/pull/578#discussion_r2064501694) --- cuda_bindings/cuda/bindings/_path_finder/supported_libs.py | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/cuda_bindings/cuda/bindings/_path_finder/supported_libs.py b/cuda_bindings/cuda/bindings/_path_finder/supported_libs.py index bd35c1ee9..6852c7fce 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/supported_libs.py +++ b/cuda_bindings/cuda/bindings/_path_finder/supported_libs.py @@ -41,6 +41,13 @@ "nvjpeg", ) +# Note: The `cufile_rdma` information is intentionally retained (commented out) +# despite not being actively used in the current build. It took a nontrivial +# amount of effort to determine the SONAME, dependencies, and expected symbols +# for this special-case library, especially given its RDMA/MLX5 dependencies +# and limited availability. Keeping this as a reference avoids having to +# reconstruct the information from scratch in the future. + PARTIALLY_SUPPORTED_LIBNAMES_LINUX_ONLY = ( "cufile", # "cufile_rdma", # Requires libmlx5.so From 72c339a6ec1e2d37597c7c7f916f8bf61d08f2b2 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Tue, 29 Apr 2025 10:54:59 -0700 Subject: [PATCH 19/52] Respond to "Do we need to implement a cache separately ..." review question (https://github.com/NVIDIA/cuda-python/pull/578#discussion_r2064567215) --- .../cuda/bindings/_path_finder/load_dl_linux.py | 4 ++-- .../cuda/bindings/_path_finder/load_dl_windows.py | 4 ++-- .../_path_finder/load_nvidia_dynamic_library.py | 10 ++++++---- 3 files changed, 10 insertions(+), 8 deletions(-) diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_dl_linux.py b/cuda_bindings/cuda/bindings/_path_finder/load_dl_linux.py index e7458d4da..27b7f39fb 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/load_dl_linux.py +++ b/cuda_bindings/cuda/bindings/_path_finder/load_dl_linux.py @@ -56,7 +56,7 @@ def abs_path_for_dynamic_library(libname: str, handle: int) -> Optional[str]: return info.dli_fname.decode() -def check_if_already_loaded(libname: str) -> Optional[LoadedDL]: +def check_if_already_loaded_from_elsewhere(libname: str) -> Optional[LoadedDL]: """Check if the library is already loaded in the process. Args: @@ -66,7 +66,7 @@ def check_if_already_loaded(libname: str) -> Optional[LoadedDL]: A LoadedDL object if the library is already loaded, None otherwise Example: - >>> loaded = check_if_already_loaded("cudart") + >>> loaded = check_if_already_loaded_from_elsewhere("cudart") >>> if loaded is not None: ... print(f"Library already loaded from {loaded.abs_path}") """ diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py b/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py index 2574b5e6f..5379b4148 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py +++ b/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py @@ -62,7 +62,7 @@ def cuDriverGetVersion() -> int: return driver_ver.value -def check_if_already_loaded(libname: str) -> Optional[LoadedDL]: +def check_if_already_loaded_from_elsewhere(libname: str) -> Optional[LoadedDL]: """Check if the library is already loaded in the process. Args: @@ -72,7 +72,7 @@ def check_if_already_loaded(libname: str) -> Optional[LoadedDL]: A LoadedDL object if the library is already loaded, None otherwise Example: - >>> loaded = check_if_already_loaded("cudart") + >>> loaded = check_if_already_loaded_from_elsewhere("cudart") >>> if loaded is not None: ... print(f"Library already loaded from {loaded.abs_path}") """ diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py b/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py index c7624bcb0..c6353cb74 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py +++ b/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py @@ -8,14 +8,16 @@ from .load_dl_common import LoadedDL, load_dependencies if sys.platform == "win32": - from .load_dl_windows import check_if_already_loaded, load_with_abs_path, load_with_system_search + from .load_dl_windows import check_if_already_loaded_from_elsewhere, load_with_abs_path, load_with_system_search else: - from .load_dl_linux import check_if_already_loaded, load_with_abs_path, load_with_system_search + from .load_dl_linux import check_if_already_loaded_from_elsewhere, load_with_abs_path, load_with_system_search def _load_nvidia_dynamic_library_no_cache(libname: str) -> LoadedDL: - # Check if library is already loaded - loaded = check_if_already_loaded(libname) + # Check whether the library is already loaded into the current process by + # some other component. This check uses OS-level mechanisms (e.g., + # dlopen on Linux, GetModuleHandle on Windows). + loaded = check_if_already_loaded_from_elsewhere(libname) if loaded is not None: return loaded From 4ce94be563f5dba48ead76eeb992bb7c041890ca Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Tue, 29 Apr 2025 10:59:00 -0700 Subject: [PATCH 20/52] Remove cuDriverGetVersion() function for now. --- .../bindings/_path_finder/load_dl_windows.py | 31 ------------------- 1 file changed, 31 deletions(-) diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py b/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py index 5379b4148..180381550 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py +++ b/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py @@ -3,7 +3,6 @@ import ctypes import ctypes.wintypes -import functools from typing import Optional import pywintypes @@ -35,33 +34,6 @@ def abs_path_for_dynamic_library(handle: int) -> str: return buf.value -@functools.cache -def cuDriverGetVersion() -> int: - """Get the CUDA driver version. - - Returns: - The CUDA driver version number - - Raises: - AssertionError: If the driver version cannot be obtained - """ - handle = win32api.LoadLibrary("nvcuda.dll") - - kernel32 = ctypes.WinDLL("kernel32", use_last_error=True) - GetProcAddress = kernel32.GetProcAddress - GetProcAddress.argtypes = [ctypes.wintypes.HMODULE, ctypes.wintypes.LPCSTR] - GetProcAddress.restype = ctypes.c_void_p - cuDriverGetVersion = GetProcAddress(handle, b"cuDriverGetVersion") - assert cuDriverGetVersion - - FUNC_TYPE = ctypes.CFUNCTYPE(ctypes.c_int, ctypes.POINTER(ctypes.c_int)) - cuDriverGetVersion_fn = FUNC_TYPE(cuDriverGetVersion) - driver_ver = ctypes.c_int() - err = cuDriverGetVersion_fn(ctypes.byref(driver_ver)) - assert err == 0 - return driver_ver.value - - def check_if_already_loaded_from_elsewhere(libname: str) -> Optional[LoadedDL]: """Check if the library is already loaded in the process. @@ -100,9 +72,6 @@ def load_with_system_search(name: str, _unused: str) -> Optional[LoadedDL]: """ from .supported_libs import SUPPORTED_WINDOWS_DLLS - driver_ver = cuDriverGetVersion() - del driver_ver # Keeping this here because it will probably be needed in the future. - dll_names = SUPPORTED_WINDOWS_DLLS.get(name) if dll_names is None: return None From 26eb4b5cb4c46fa40fba7ff9fad4b92cb905c9cf Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Tue, 29 Apr 2025 11:04:29 -0700 Subject: [PATCH 21/52] Move add_dll_directory() from load_dl_common.py to load_dl_windows.py (response to review question https://github.com/NVIDIA/cuda-python/pull/578#discussion_r2064624395) --- .../bindings/_path_finder/load_dl_common.py | 20 ----------------- .../bindings/_path_finder/load_dl_windows.py | 22 ++++++++++++++++++- 2 files changed, 21 insertions(+), 21 deletions(-) diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_dl_common.py b/cuda_bindings/cuda/bindings/_path_finder/load_dl_common.py index 66f21ffcf..2b8b1b69f 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/load_dl_common.py +++ b/cuda_bindings/cuda/bindings/_path_finder/load_dl_common.py @@ -25,26 +25,6 @@ class LoadedDL: was_already_loaded_from_elsewhere: bool -def add_dll_directory(dll_abs_path: str) -> None: - """Add a DLL directory to the search path and update PATH environment variable. - - Args: - dll_abs_path: Absolute path to the DLL file - - Raises: - AssertionError: If the directory containing the DLL does not exist - """ - import os - - dirpath = os.path.dirname(dll_abs_path) - assert os.path.isdir(dirpath), dll_abs_path - # Add the DLL directory to the search path - os.add_dll_directory(dirpath) - # Update PATH as a fallback for dependent DLL resolution - curr_path = os.environ.get("PATH") - os.environ["PATH"] = dirpath if curr_path is None else os.pathsep.join((curr_path, dirpath)) - - def load_dependencies(libname: str, load_func: Callable[[str], LoadedDL]) -> None: """Load all dependencies for a given library. diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py b/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py index 180381550..59194f97d 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py +++ b/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py @@ -8,13 +8,33 @@ import pywintypes import win32api -from .load_dl_common import LoadedDL, add_dll_directory +from .load_dl_common import LoadedDL # Mirrors WinBase.h (unfortunately not defined already elsewhere) WINBASE_LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR = 0x00000100 WINBASE_LOAD_LIBRARY_SEARCH_DEFAULT_DIRS = 0x00001000 +def add_dll_directory(dll_abs_path: str) -> None: + """Add a DLL directory to the search path and update PATH environment variable. + + Args: + dll_abs_path: Absolute path to the DLL file + + Raises: + AssertionError: If the directory containing the DLL does not exist + """ + import os + + dirpath = os.path.dirname(dll_abs_path) + assert os.path.isdir(dirpath), dll_abs_path + # Add the DLL directory to the search path + os.add_dll_directory(dirpath) + # Update PATH as a fallback for dependent DLL resolution + curr_path = os.environ.get("PATH") + os.environ["PATH"] = dirpath if curr_path is None else os.pathsep.join((curr_path, dirpath)) + + def abs_path_for_dynamic_library(handle: int) -> str: """Get the absolute path of a loaded dynamic library on Windows. From 72d25671a2eb571d4f067fb1feea1e0a7851fd4e Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Tue, 29 Apr 2025 13:27:33 -0700 Subject: [PATCH 22/52] Add SPDX-License-Identifier and # Forked from: URL in cuda_paths.py --- cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py b/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py index e27e6f54b..d3f266521 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py +++ b/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py @@ -1,3 +1,9 @@ +# Copyright 2025 NVIDIA Corporation. All rights reserved. +# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE + +# Forked from then pending https://github.com/NVIDIA/numba-cuda/pull/155 on 2025-03-19: +# https://github.com/brandon-b-miller/numba-cuda/blob/d4bf1137e6fcc65f6e2ba2be6f723e69b357b798/numba_cuda/numba/cuda/cuda_paths.py + import os import platform import re From e14391d6cbc7cefadcb8eaf892a74ba4892032b6 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Tue, 29 Apr 2025 13:46:43 -0700 Subject: [PATCH 23/52] Add Add SPDX-License-Identifier and Original LICENSE in findlib.py --- .../cuda/bindings/_path_finder/findlib.py | 28 +++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/cuda_bindings/cuda/bindings/_path_finder/findlib.py b/cuda_bindings/cuda/bindings/_path_finder/findlib.py index 4de57c905..992a3940e 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/findlib.py +++ b/cuda_bindings/cuda/bindings/_path_finder/findlib.py @@ -1,5 +1,33 @@ +# SPDX-License-Identifier: BSD-2-Clause +# # Forked from: # https://github.com/numba/numba/blob/f0d24824fcd6a454827e3c108882395d00befc04/numba/misc/findlib.py +# +# Original LICENSE: +# Copyright (c) 2012, Anaconda, Inc. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# Redistributions of source code must retain the above copyright notice, +# this list of conditions and the following disclaimer. +# +# Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. import os import re From 9154995dc97105befff47549d9a145b815bf12c9 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Tue, 29 Apr 2025 15:02:05 -0700 Subject: [PATCH 24/52] Very first draft of README.md --- .../cuda/bindings/_path_finder/README.md | 43 +++++++++++++++++++ 1 file changed, 43 insertions(+) create mode 100644 cuda_bindings/cuda/bindings/_path_finder/README.md diff --git a/cuda_bindings/cuda/bindings/_path_finder/README.md b/cuda_bindings/cuda/bindings/_path_finder/README.md new file mode 100644 index 000000000..0955a76fd --- /dev/null +++ b/cuda_bindings/cuda/bindings/_path_finder/README.md @@ -0,0 +1,43 @@ +`cuda.bindings.path_finder` +=========================== + +Currently, the only two (semi-)public APIs are: + +* `cuda.bindings.path_finder._SUPPORTED_LIBNAMES` (currently `('nvJitLink', 'nvrtc', 'nvvm')`) + +* `cuda.bindings.path_finder._load_nvidia_dynamic_library(libname: str) -> LoadedDL` + +These APIs are prefixed with an underscore because they are STILL A WORK IN PROGRESS, +although already fairly well tested. + +`load_nvidia_dynamic_library()` is meant to become the one, central, go-to API for +loading NVIDIA shared libraries from Python. + + +Search Priority +--------------- + +The _intended_ search priority for locating NVIDIA dynamic libraries is: + +* *site-packages* — Traversal of Python's `sys.path` (in order), which for all practical + purposes amounts to a search for libraries installed from NVIDIA wheels. + +* *Conda* — Currently mplemented via `get_cuda_paths()` as forked from numba/cuda/cuda_paths.py + +* *System* — Also implemented via `get_cuda_paths()` + +* *OS-provided search* — `dlopen()` (Linux) or `LoadLibraryW()` (Windows) mechanisms + + +Currently, our fork of cuda_paths.py is intentionally used as-is. +cuda_paths.py has a long and convoluted development history, but that also means +the product is time-tested. Our strategy for evolving the implementation is: + +* Establish a minimal viable product as a baseline (current stage). + +* Establish a comprehensive testing infrastructure (GitHub Actions / CI) to + cover all sorts of environments that we want to support. + +* Combine, refactor, and clean up find_nvidia_dynamic_library.py & cuda_paths.py + to achieve a more maintainable and robust implementation of the intended + dynamic library search priority. From bdfc6a756d39def2cd05d180e5d636e6bc0937dc Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Tue, 29 Apr 2025 15:09:19 -0700 Subject: [PATCH 25/52] Update README.md, mostly as revised by perplexity, with various manual edits. --- .../cuda/bindings/_path_finder/README.md | 70 ++++++++++++------- 1 file changed, 44 insertions(+), 26 deletions(-) diff --git a/cuda_bindings/cuda/bindings/_path_finder/README.md b/cuda_bindings/cuda/bindings/_path_finder/README.md index 0955a76fd..e66668fd6 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/README.md +++ b/cuda_bindings/cuda/bindings/_path_finder/README.md @@ -1,43 +1,61 @@ -`cuda.bindings.path_finder` -=========================== +# `cuda.bindings.path_finder` Module -Currently, the only two (semi-)public APIs are: +## Public API (Work in Progress) -* `cuda.bindings.path_finder._SUPPORTED_LIBNAMES` (currently `('nvJitLink', 'nvrtc', 'nvvm')`) +Currently exposes two primary interfaces: -* `cuda.bindings.path_finder._load_nvidia_dynamic_library(libname: str) -> LoadedDL` +``` +cuda.bindings.path_finder._SUPPORTED_LIBNAMES # ('nvJitLink', 'nvrtc', 'nvvm') +cuda.bindings.path_finder._load_nvidia_dynamic_library(libname: str) -> LoadedDL +``` -These APIs are prefixed with an underscore because they are STILL A WORK IN PROGRESS, -although already fairly well tested. +**Note:** +These APIs are prefixed with an underscore because they are considered +experimental while undergoing active development, although already +reasonably well-tested through CI pipelines. -`load_nvidia_dynamic_library()` is meant to become the one, central, go-to API for -loading NVIDIA shared libraries from Python. +## Library Loading Search Priority +The `load_nvidia_dynamic_library()` function implements a hierarchical search +strategy for locating NVIDIA shared libraries: -Search Priority ---------------- +1. **Python Package Ecosystem** + - Scans `sys.path` to find libraries installed via NVIDIA Python wheels -The _intended_ search priority for locating NVIDIA dynamic libraries is: +2. **Conda Environments** + - Leverages Conda-specific paths through our fork of `get_cuda_paths()` from Numba -* *site-packages* — Traversal of Python's `sys.path` (in order), which for all practical - purposes amounts to a search for libraries installed from NVIDIA wheels. +3. **System Installations** + - Checks traditional system locations via the same `get_cuda_paths()` implementation -* *Conda* — Currently mplemented via `get_cuda_paths()` as forked from numba/cuda/cuda_paths.py +4. **OS Default Mechanisms** + - Falls back to native loader: + - `dlopen()` on Linux + - `LoadLibraryW()` on Windows -* *System* — Also implemented via `get_cuda_paths()` +## Implementation Philosophy -* *OS-provided search* — `dlopen()` (Linux) or `LoadLibraryW()` (Windows) mechanisms +The current implementation balances stability and evolution: +- **Baseline Foundation:** Uses a fork of Numba's `cuda_paths.py` that has been + battle-tested in production environments -Currently, our fork of cuda_paths.py is intentionally used as-is. -cuda_paths.py has a long and convoluted development history, but that also means -the product is time-tested. Our strategy for evolving the implementation is: +- **Validation Infrastructure:** Comprehensive CI testing matrix being developed to cover: + - Various Linux/Windows environments + - Python packaging formats (wheels, conda) + - CUDA Toolkit versions -* Establish a minimal viable product as a baseline (current stage). +- **Roadmap:** Planned refactoring to: + - Unify library discovery logic + - Improve maintainability + - Better enforce search priority + - Expand platform support -* Establish a comprehensive testing infrastructure (GitHub Actions / CI) to - cover all sorts of environments that we want to support. +## Maintenance Requirements -* Combine, refactor, and clean up find_nvidia_dynamic_library.py & cuda_paths.py - to achieve a more maintainable and robust implementation of the intended - dynamic library search priority. +These key components must be updated for new CUDA Toolkit releases: + +- `supported_libs.SUPPORTED_LIBNAMES` +- `supported_libs.SUPPORTED_WINDOWS_DLLS` +- `supported_libs.SUPPORTED_LINUX_SONAMES` +- `supported_libs.EXPECTED_LIB_SYMBOLS` From 2ad4b792a97db77c718d5259a19d83e583fc9ad2 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Tue, 29 Apr 2025 22:02:32 -0700 Subject: [PATCH 26/52] Refork cuda_paths.py AS-IS: https://github.com/NVIDIA/numba-cuda/blob/8c9c9d0cb901c06774a9abea6d12b6a4b0287e5e/numba_cuda/numba/cuda/cuda_paths.py --- .../cuda/bindings/_path_finder/cuda_paths.py | 349 ++++++++++++------ 1 file changed, 231 insertions(+), 118 deletions(-) diff --git a/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py b/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py index d3f266521..9bcccf9ce 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py +++ b/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py @@ -1,58 +1,49 @@ -# Copyright 2025 NVIDIA Corporation. All rights reserved. -# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE - -# Forked from then pending https://github.com/NVIDIA/numba-cuda/pull/155 on 2025-03-19: -# https://github.com/brandon-b-miller/numba-cuda/blob/d4bf1137e6fcc65f6e2ba2be6f723e69b357b798/numba_cuda/numba/cuda/cuda_paths.py - +import sys +import re import os +from collections import namedtuple import platform -import re import site -import sys -import traceback -import warnings -from collections import namedtuple from pathlib import Path - -from .findlib import find_file, find_lib - -IS_WIN32 = sys.platform.startswith("win32") +from numba.core.config import IS_WIN32 +from numba.misc.findlib import find_lib +from numba import config +import ctypes _env_path_tuple = namedtuple("_env_path_tuple", ["by", "info"]) +SEARCH_PRIORITY = [ + "Conda environment", + "Conda environment (NVIDIA package)", + "NVIDIA NVCC Wheel", + "CUDA_HOME", + "System", + "Debian package", +] -def _get_numba_CUDA_INCLUDE_PATH(): - # From numba/numba/core/config.py - def _readenv(name, ctor, default): - value = os.environ.get(name) - if value is None: - return default() if callable(default) else default - try: - return ctor(value) - except Exception: - warnings.warn( # noqa: B028 - f"Environment variable '{name}' is defined but " - f"its associated value '{value}' could not be " - "parsed.\nThe parse failed with exception:\n" - f"{traceback.format_exc()}", - RuntimeWarning, - ) - return default - - if IS_WIN32: - cuda_path = os.environ.get("CUDA_PATH") - if cuda_path: # noqa: SIM108 - default_cuda_include_path = os.path.join(cuda_path, "include") - else: - default_cuda_include_path = "cuda_include_not_found" +def _priority_index(label): + if label in SEARCH_PRIORITY: + return SEARCH_PRIORITY.index(label) else: - default_cuda_include_path = os.path.join(os.sep, "usr", "local", "cuda", "include") - CUDA_INCLUDE_PATH = _readenv("NUMBA_CUDA_INCLUDE_PATH", str, default_cuda_include_path) - return CUDA_INCLUDE_PATH + raise ValueError(f"Can't determine search priority for {label}") + +def _find_first_valid_lazy(options): + sorted_options = sorted(options, key=lambda x: _priority_index(x[0])) + for label, fn in sorted_options: + value = fn() + if value: + return label, value + return "", None -config_CUDA_INCLUDE_PATH = _get_numba_CUDA_INCLUDE_PATH() + +def _build_options(pairs): + """Sorts and returns a list of (label, value) tuples according to SEARCH_PRIORITY.""" + priority_index = {label: i for i, label in enumerate(SEARCH_PRIORITY)} + return sorted( + pairs, key=lambda pair: priority_index.get(pair[0], float("inf")) + ) def _find_valid_path(options): @@ -68,19 +59,17 @@ def _find_valid_path(options): def _get_libdevice_path_decision(): - options = [ - ("Conda environment", get_conda_ctk()), - ("Conda environment (NVIDIA package)", get_nvidia_libdevice_ctk()), - ("CUDA_HOME", get_cuda_home("nvvm", "libdevice")), - ("Debian package", get_debian_pkg_libdevice()), - ("NVIDIA NVCC Wheel", get_libdevice_wheel()), - ] - libdevice_ctk_dir = get_system_ctk("nvvm", "libdevice") - if libdevice_ctk_dir and os.path.exists(libdevice_ctk_dir): - options.append(("System", libdevice_ctk_dir)) - - by, libdir = _find_valid_path(options) - return by, libdir + options = _build_options( + [ + ("Conda environment", get_conda_ctk), + ("Conda environment (NVIDIA package)", get_nvidia_libdevice_ctk), + ("CUDA_HOME", lambda: get_cuda_home("nvvm", "libdevice")), + ("NVIDIA NVCC Wheel", get_libdevice_wheel), + ("System", lambda: get_system_ctk("nvvm", "libdevice")), + ("Debian package", get_debian_pkg_libdevice), + ] + ) + return _find_first_valid_lazy(options) def _nvvm_lib_dir(): @@ -92,53 +81,129 @@ def _nvvm_lib_dir(): def _get_nvvm_path_decision(): options = [ - ("Conda environment", get_conda_ctk()), - ("Conda environment (NVIDIA package)", get_nvidia_nvvm_ctk()), - ("CUDA_HOME", get_cuda_home(*_nvvm_lib_dir())), - ("NVIDIA NVCC Wheel", _get_nvvm_wheel()), + ("Conda environment", get_conda_ctk), + ("Conda environment (NVIDIA package)", get_nvidia_nvvm_ctk), + ("NVIDIA NVCC Wheel", _get_nvvm_wheel), + ("CUDA_HOME", lambda: get_cuda_home(*_nvvm_lib_dir())), + ("System", lambda: get_system_ctk(*_nvvm_lib_dir())), ] - # need to ensure nvvm dir actually exists - nvvm_ctk_dir = get_system_ctk(*_nvvm_lib_dir()) - if nvvm_ctk_dir and os.path.exists(nvvm_ctk_dir): - options.append(("System", nvvm_ctk_dir)) + return _find_first_valid_lazy(options) + - by, path = _find_valid_path(options) - return by, path +def _get_nvrtc_system_ctk(): + sys_path = get_system_ctk("bin" if IS_WIN32 else "lib64") + candidates = find_lib("nvrtc", sys_path) + if candidates: + return max(candidates) + + +def _get_nvrtc_path_decision(): + options = _build_options( + [ + ("CUDA_HOME", lambda: get_cuda_home("nvrtc")), + ("Conda environment", get_conda_ctk), + ("Conda environment (NVIDIA package)", get_nvidia_cudalib_ctk), + ("NVIDIA NVCC Wheel", _get_nvrtc_wheel), + ("System", _get_nvrtc_system_ctk), + ] + ) + return _find_first_valid_lazy(options) def _get_nvvm_wheel(): - site_paths = [site.getusersitepackages()] + site.getsitepackages() + ["conda", None] + platform_map = { + "linux": ("lib64", "libnvvm.so"), + "win32": ("bin", "nvvm64_40_0.dll"), + } + + for plat, (dso_dir, dso_path) in platform_map.items(): + if sys.platform.startswith(plat): + break + else: + raise NotImplementedError("Unsupported platform") + + site_paths = [site.getusersitepackages()] + site.getsitepackages() + + for sp in filter(None, site_paths): + nvvm_path = Path(sp, "nvidia", "cuda_nvcc", "nvvm", dso_dir, dso_path) + if nvvm_path.exists(): + return str(nvvm_path.parent) + + return None + + +def get_major_cuda_version(): + # TODO: remove once cuda-python is + # a hard dependency + from numba.cuda.cudadrv.runtime import get_version + + return get_version()[0] + + +def get_nvrtc_dso_path(): + site_paths = [site.getusersitepackages()] + site.getsitepackages() for sp in site_paths: - # The SONAME is taken based on public CTK 12.x releases - if sys.platform.startswith("linux"): - dso_dir = "lib64" - # Hack: libnvvm from Linux wheel - # does not have any soname (CUDAINST-3183) - dso_path = "libnvvm.so" - elif sys.platform.startswith("win32"): - dso_dir = "bin" - dso_path = "nvvm64_40_0.dll" + lib_dir = os.path.join( + sp, + "nvidia", + "cuda_nvrtc", + ("bin" if IS_WIN32 else "lib") if sp else None, + ) + if lib_dir and os.path.exists(lib_dir): + try: + major = get_major_cuda_version() + if major == 11: + cu_ver = "112" if IS_WIN32 else "11.2" + elif major == 12: + cu_ver = "120" if IS_WIN32 else "12" + else: + raise NotImplementedError(f"CUDA {major} is not supported") + + return os.path.join( + lib_dir, + f"nvrtc64_{cu_ver}_0.dll" + if IS_WIN32 + else f"libnvrtc.so.{cu_ver}", + ) + except RuntimeError: + continue + + +def _get_nvrtc_wheel(): + dso_path = get_nvrtc_dso_path() + if dso_path: + try: + result = ctypes.CDLL(dso_path, mode=ctypes.RTLD_GLOBAL) + except OSError: + pass else: - raise AssertionError() - - if sp is not None: - dso_dir = os.path.join(sp, "nvidia", "cuda_nvcc", "nvvm", dso_dir) - dso_path = os.path.join(dso_dir, dso_path) - if os.path.exists(dso_path): - return str(Path(dso_path).parent) + if IS_WIN32: + import win32api + + # This absolute path will + # always be correct regardless of the package source + nvrtc_path = win32api.GetModuleFileNameW(result._handle) + dso_dir = os.path.dirname(nvrtc_path) + builtins_path = os.path.join( + dso_dir, + [ + f + for f in os.listdir(dso_dir) + if re.match("^nvrtc-builtins.*.dll$", f) + ][0], + ) + if not os.path.exists(builtins_path): + raise RuntimeError( + f'Path does not exist: "{builtins_path}"' + ) + return Path(dso_path) def _get_libdevice_paths(): by, libdir = _get_libdevice_path_decision() - if by == "NVIDIA NVCC Wheel": - # The NVVM path is a directory, not a file - out = os.path.join(libdir, "libdevice.10.bc") - else: - # Search for pattern - pat = r"libdevice(\.\d+)*\.bc$" - candidates = find_file(re.compile(pat), libdir) - # Keep only the max (most recent version) of the bitcode files. - out = max(candidates, default=None) + if not libdir: + return _env_path_tuple(by, None) + out = os.path.join(libdir, "libdevice.10.bc") return _env_path_tuple(by, out) @@ -156,26 +221,46 @@ def _cuda_home_static_cudalib_path(): return ("lib64",) +def _get_cudalib_wheel(): + """Get the cudalib path from the NVCC wheel.""" + site_paths = [site.getusersitepackages()] + site.getsitepackages() + libdir = "bin" if IS_WIN32 else "lib" + for sp in filter(None, site_paths): + cudalib_path = Path(sp, "nvidia", "cuda_runtime", libdir) + if cudalib_path.exists(): + return str(cudalib_path) + return None + + def _get_cudalib_dir_path_decision(): - options = [ - ("Conda environment", get_conda_ctk()), - ("Conda environment (NVIDIA package)", get_nvidia_cudalib_ctk()), - ("CUDA_HOME", get_cuda_home(_cudalib_path())), - ("System", get_system_ctk(_cudalib_path())), - ] - by, libdir = _find_valid_path(options) - return by, libdir + options = _build_options( + [ + ("Conda environment", get_conda_ctk), + ("Conda environment (NVIDIA package)", get_nvidia_cudalib_ctk), + ("NVIDIA NVCC Wheel", _get_cudalib_wheel), + ("CUDA_HOME", lambda: get_cuda_home(_cudalib_path())), + ("System", lambda: get_system_ctk(_cudalib_path())), + ] + ) + return _find_first_valid_lazy(options) def _get_static_cudalib_dir_path_decision(): - options = [ - ("Conda environment", get_conda_ctk()), - ("Conda environment (NVIDIA package)", get_nvidia_static_cudalib_ctk()), - ("CUDA_HOME", get_cuda_home(*_cuda_home_static_cudalib_path())), - ("System", get_system_ctk(_cudalib_path())), - ] - by, libdir = _find_valid_path(options) - return by, libdir + options = _build_options( + [ + ("Conda environment", get_conda_ctk), + ( + "Conda environment (NVIDIA package)", + get_nvidia_static_cudalib_ctk, + ), + ( + "CUDA_HOME", + lambda: get_cuda_home(*_cuda_home_static_cudalib_path()), + ), + ("System", lambda: get_system_ctk(_cudalib_path())), + ] + ) + return _find_first_valid_lazy(options) def _get_cudalib_dir(): @@ -191,12 +276,12 @@ def _get_static_cudalib_dir(): def get_system_ctk(*subdirs): """Return path to system-wide cudatoolkit; or, None if it doesn't exist.""" # Linux? - if sys.platform.startswith("linux"): + if not IS_WIN32: # Is cuda alias to /usr/local/cuda? # We are intentionally not getting versioned cuda installation. - base = "/usr/local/cuda" - if os.path.exists(base): - return os.path.join(base, *subdirs) + result = os.path.join("/usr/local/cuda", *subdirs) + if os.path.exists(result): + return result def get_conda_ctk(): @@ -263,7 +348,7 @@ def get_nvidia_static_cudalib_ctk(): if not nvvm_ctk: return - if IS_WIN32 and ("Library" not in nvvm_ctk): # noqa: SIM108 + if IS_WIN32 and ("Library" not in nvvm_ctk): # Location specific to CUDA 11.x packages on Windows dirs = ("Lib", "x64") else: @@ -289,15 +374,38 @@ def get_cuda_home(*subdirs): def _get_nvvm_path(): by, path = _get_nvvm_path_decision() + if by == "NVIDIA NVCC Wheel": - # The NVVM path is a directory, not a file - path = os.path.join(path, "libnvvm.so") + platform_map = { + "linux": "libnvvm.so", + "win32": "nvvm64_40_0.dll", + } + + for plat, dso_name in platform_map.items(): + if sys.platform.startswith(plat): + break + else: + raise NotImplementedError("Unsupported platform") + + path = os.path.join(path, dso_name) else: candidates = find_lib("nvvm", path) path = max(candidates) if candidates else None return _env_path_tuple(by, path) +def _get_nvrtc_path(): + by, path = _get_nvrtc_path_decision() + if by == "NVIDIA NVCC Wheel": + path = str(path) + elif by == "System": + return _env_path_tuple(by, path) + else: + candidates = find_lib("nvrtc", path) + path = max(candidates) if candidates else None + return _env_path_tuple(by, path) + + def get_cuda_paths(): """Returns a dictionary mapping component names to a 2-tuple of (source_variable, info). @@ -316,6 +424,7 @@ def get_cuda_paths(): # Not in cache d = { "nvvm": _get_nvvm_path(), + "nvrtc": _get_nvrtc_path(), "libdevice": _get_libdevice_paths(), "cudalib_dir": _get_cudalib_dir(), "static_cudalib_dir": _get_static_cudalib_dir(), @@ -383,7 +492,9 @@ def get_conda_include_dir(): if platform.system() == "Windows": include_dir = os.path.join(sys.prefix, "Library", "include") elif target_name := get_current_cuda_target_name(): - include_dir = os.path.join(sys.prefix, "targets", target_name, "include") + include_dir = os.path.join( + sys.prefix, "targets", target_name, "include" + ) else: # A fallback when target cannot determined # though usually it shouldn't. @@ -392,7 +503,9 @@ def get_conda_include_dir(): if ( os.path.exists(include_dir) and os.path.isdir(include_dir) - and os.path.exists(os.path.join(include_dir, "cuda_device_runtime_api.h")) + and os.path.exists( + os.path.join(include_dir, "cuda_device_runtime_api.h") + ) ): return include_dir return @@ -402,7 +515,7 @@ def _get_include_dir(): """Find the root include directory.""" options = [ ("Conda environment (NVIDIA package)", get_conda_include_dir()), - ("CUDA_INCLUDE_PATH Config Entry", config_CUDA_INCLUDE_PATH), + ("CUDA_INCLUDE_PATH Config Entry", config.CUDA_INCLUDE_PATH), # TODO: add others ] by, include_dir = _find_valid_path(options) From 7dcaa504ec800ae17d7c3b6e4e6366e3ef31e5a5 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Tue, 29 Apr 2025 22:07:14 -0700 Subject: [PATCH 27/52] ruff format cuda_paths.py (NO manual changes) --- .../cuda/bindings/_path_finder/cuda_paths.py | 39 +++++++------------ 1 file changed, 13 insertions(+), 26 deletions(-) diff --git a/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py b/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py index 9bcccf9ce..db4d51470 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py +++ b/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py @@ -1,14 +1,15 @@ -import sys -import re +import ctypes import os -from collections import namedtuple import platform +import re import site +import sys +from collections import namedtuple from pathlib import Path + +from numba import config from numba.core.config import IS_WIN32 from numba.misc.findlib import find_lib -from numba import config -import ctypes _env_path_tuple = namedtuple("_env_path_tuple", ["by", "info"]) @@ -41,9 +42,7 @@ def _find_first_valid_lazy(options): def _build_options(pairs): """Sorts and returns a list of (label, value) tuples according to SEARCH_PRIORITY.""" priority_index = {label: i for i, label in enumerate(SEARCH_PRIORITY)} - return sorted( - pairs, key=lambda pair: priority_index.get(pair[0], float("inf")) - ) + return sorted(pairs, key=lambda pair: priority_index.get(pair[0], float("inf"))) def _find_valid_path(options): @@ -161,9 +160,7 @@ def get_nvrtc_dso_path(): return os.path.join( lib_dir, - f"nvrtc64_{cu_ver}_0.dll" - if IS_WIN32 - else f"libnvrtc.so.{cu_ver}", + f"nvrtc64_{cu_ver}_0.dll" if IS_WIN32 else f"libnvrtc.so.{cu_ver}", ) except RuntimeError: continue @@ -186,16 +183,10 @@ def _get_nvrtc_wheel(): dso_dir = os.path.dirname(nvrtc_path) builtins_path = os.path.join( dso_dir, - [ - f - for f in os.listdir(dso_dir) - if re.match("^nvrtc-builtins.*.dll$", f) - ][0], + [f for f in os.listdir(dso_dir) if re.match("^nvrtc-builtins.*.dll$", f)][0], ) if not os.path.exists(builtins_path): - raise RuntimeError( - f'Path does not exist: "{builtins_path}"' - ) + raise RuntimeError(f'Path does not exist: "{builtins_path}"') return Path(dso_path) @@ -348,7 +339,7 @@ def get_nvidia_static_cudalib_ctk(): if not nvvm_ctk: return - if IS_WIN32 and ("Library" not in nvvm_ctk): + if IS_WIN32 and ("Library" not in nvvm_ctk): # noqa: SIM108 # Location specific to CUDA 11.x packages on Windows dirs = ("Lib", "x64") else: @@ -492,9 +483,7 @@ def get_conda_include_dir(): if platform.system() == "Windows": include_dir = os.path.join(sys.prefix, "Library", "include") elif target_name := get_current_cuda_target_name(): - include_dir = os.path.join( - sys.prefix, "targets", target_name, "include" - ) + include_dir = os.path.join(sys.prefix, "targets", target_name, "include") else: # A fallback when target cannot determined # though usually it shouldn't. @@ -503,9 +492,7 @@ def get_conda_include_dir(): if ( os.path.exists(include_dir) and os.path.isdir(include_dir) - and os.path.exists( - os.path.join(include_dir, "cuda_device_runtime_api.h") - ) + and os.path.exists(os.path.join(include_dir, "cuda_device_runtime_api.h")) ): return include_dir return From 714b88c6b2492f6e774f8fe40b1c1205b50d64ac Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Tue, 29 Apr 2025 22:21:29 -0700 Subject: [PATCH 28/52] Add back _get_numba_CUDA_INCLUDE_PATH from 2279bda65640b73a9a5632df878f52aedcbbd642 (i.e. cuda_paths.py as it was right before re-forking) --- .../cuda/bindings/_path_finder/cuda_paths.py | 44 +++++++++++++++++-- 1 file changed, 40 insertions(+), 4 deletions(-) diff --git a/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py b/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py index db4d51470..936f1cab6 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py +++ b/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py @@ -4,15 +4,51 @@ import re import site import sys +import traceback +import warnings from collections import namedtuple from pathlib import Path -from numba import config -from numba.core.config import IS_WIN32 -from numba.misc.findlib import find_lib +from .findlib import find_lib + +IS_WIN32 = sys.platform.startswith("win32") _env_path_tuple = namedtuple("_env_path_tuple", ["by", "info"]) + +def _get_numba_CUDA_INCLUDE_PATH(): + # From numba/numba/core/config.py + + def _readenv(name, ctor, default): + value = os.environ.get(name) + if value is None: + return default() if callable(default) else default + try: + return ctor(value) + except Exception: + warnings.warn( # noqa: B028 + f"Environment variable '{name}' is defined but " + f"its associated value '{value}' could not be " + "parsed.\nThe parse failed with exception:\n" + f"{traceback.format_exc()}", + RuntimeWarning, + ) + return default + + if IS_WIN32: + cuda_path = os.environ.get("CUDA_PATH") + if cuda_path: # noqa: SIM108 + default_cuda_include_path = os.path.join(cuda_path, "include") + else: + default_cuda_include_path = "cuda_include_not_found" + else: + default_cuda_include_path = os.path.join(os.sep, "usr", "local", "cuda", "include") + CUDA_INCLUDE_PATH = _readenv("NUMBA_CUDA_INCLUDE_PATH", str, default_cuda_include_path) + return CUDA_INCLUDE_PATH + + +config_CUDA_INCLUDE_PATH = _get_numba_CUDA_INCLUDE_PATH() + SEARCH_PRIORITY = [ "Conda environment", "Conda environment (NVIDIA package)", @@ -502,7 +538,7 @@ def _get_include_dir(): """Find the root include directory.""" options = [ ("Conda environment (NVIDIA package)", get_conda_include_dir()), - ("CUDA_INCLUDE_PATH Config Entry", config.CUDA_INCLUDE_PATH), + ("CUDA_INCLUDE_PATH Config Entry", config_CUDA_INCLUDE_PATH), # TODO: add others ] by, include_dir = _find_valid_path(options) From 166837de0ae64ce69e210a5f67fcc4ea14fd37fc Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Tue, 29 Apr 2025 22:26:55 -0700 Subject: [PATCH 29/52] Remove cuda_paths.py dependency on numba.cuda.cudadrv.runtime --- .../cuda/bindings/_path_finder/cuda_paths.py | 18 +++++------------- 1 file changed, 5 insertions(+), 13 deletions(-) diff --git a/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py b/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py index 936f1cab6..646184da7 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py +++ b/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py @@ -167,14 +167,6 @@ def _get_nvvm_wheel(): return None -def get_major_cuda_version(): - # TODO: remove once cuda-python is - # a hard dependency - from numba.cuda.cudadrv.runtime import get_version - - return get_version()[0] - - def get_nvrtc_dso_path(): site_paths = [site.getusersitepackages()] + site.getsitepackages() for sp in site_paths: @@ -185,8 +177,7 @@ def get_nvrtc_dso_path(): ("bin" if IS_WIN32 else "lib") if sp else None, ) if lib_dir and os.path.exists(lib_dir): - try: - major = get_major_cuda_version() + for major in (12, 11): if major == 11: cu_ver = "112" if IS_WIN32 else "11.2" elif major == 12: @@ -194,12 +185,13 @@ def get_nvrtc_dso_path(): else: raise NotImplementedError(f"CUDA {major} is not supported") - return os.path.join( + dso_path = os.path.join( lib_dir, f"nvrtc64_{cu_ver}_0.dll" if IS_WIN32 else f"libnvrtc.so.{cu_ver}", ) - except RuntimeError: - continue + if os.path.isfile(dso_path): + return dso_path + return None def _get_nvrtc_wheel(): From ad1e85e73c2c569c852eff83263351036cf9e193 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Tue, 29 Apr 2025 22:39:37 -0700 Subject: [PATCH 30/52] Add Forked from URLs, two SPDX-License-Identifier, Original Numba LICENSE --- .../cuda/bindings/_path_finder/cuda_paths.py | 36 +++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py b/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py index 646184da7..9d441a4d3 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py +++ b/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py @@ -1,3 +1,39 @@ +# Copyright 2025 NVIDIA Corporation. All rights reserved. +# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE + +# Forked from: +# https://github.com/NVIDIA/numba-cuda/blob/8c9c9d0cb901c06774a9abea6d12b6a4b0287e5e/numba_cuda/numba/cuda/cuda_paths.py + +# The numba-cuda version in turn was forked from: +# https://github.com/numba/numba/blob/6c8a71ffc3eaa1c68e1bac927b80ee7469002b3f/numba/cuda/cuda_paths.py +# SPDX-License-Identifier: BSD-2-Clause +# +# Original Numba LICENSE: +# Copyright (c) 2012, Anaconda, Inc. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# Redistributions of source code must retain the above copyright notice, +# this list of conditions and the following disclaimer. +# +# Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + import ctypes import os import platform From 47ad79f317c30423a023bfef28f68163360728b6 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Tue, 29 Apr 2025 22:50:12 -0700 Subject: [PATCH 31/52] Temporarily restore debug changes under .github/workflows, for expanded path_finder test coverage --- .github/workflows/build-and-test.yml | 10 ---------- .github/workflows/test-wheel-linux.yml | 12 ++++++++---- .github/workflows/test-wheel-windows.yml | 9 +++++---- cuda_bindings/pyproject.toml | 10 ++++++++++ 4 files changed, 23 insertions(+), 18 deletions(-) diff --git a/.github/workflows/build-and-test.yml b/.github/workflows/build-and-test.yml index a00a94028..6ece1b235 100644 --- a/.github/workflows/build-and-test.yml +++ b/.github/workflows/build-and-test.yml @@ -25,14 +25,9 @@ jobs: matrix: host-platform: - linux-64 - - linux-aarch64 - win-64 python-version: - - "3.13" - "3.12" - - "3.11" - - "3.10" - - "3.9" cuda-version: # Note: this is for build-time only. - "12.8.0" @@ -215,13 +210,8 @@ jobs: matrix: host-platform: - linux-64 - - linux-aarch64 python-version: - - "3.13" - "3.12" - - "3.11" - - "3.10" - - "3.9" cuda-version: # Note: this is for test-time only. - "12.8.0" diff --git a/.github/workflows/test-wheel-linux.yml b/.github/workflows/test-wheel-linux.yml index 83dad5cec..1e2a33400 100644 --- a/.github/workflows/test-wheel-linux.yml +++ b/.github/workflows/test-wheel-linux.yml @@ -78,6 +78,10 @@ jobs: fi fi + if [[ "${{ inputs.local-ctk }}" != 1 ]]; then + echo "CUDA_BINDINGS_PATH_FINDER_TEST_ALL_LIBNAMES=1" >> $GITHUB_ENV + fi + # make outputs from the previous job as env vars CUDA_CORE_ARTIFACT_BASENAME="cuda-core-python${PYTHON_VERSION_FORMATTED}-${{ inputs.host-platform }}" echo "PYTHON_VERSION_FORMATTED=${PYTHON_VERSION_FORMATTED}" >> $GITHUB_ENV @@ -216,7 +220,7 @@ jobs: pushd ./cuda_bindings pip install -r requirements.txt - ${SANITIZER_CMD} pytest -rxXs -v tests/ + ${SANITIZER_CMD} pytest -ra -s -v tests/ # It is a bit convoluted to run the Cython tests against CTK wheels, # so let's just skip them. @@ -227,7 +231,7 @@ jobs: # TODO: enable this once win-64 runners are up exit 1 fi - ${SANITIZER_CMD} pytest -rxXs -v tests/cython + ${SANITIZER_CMD} pytest -ra -s -v tests/cython fi popd @@ -251,7 +255,7 @@ jobs: pushd ./cuda_core pip install -r "tests/requirements-cu${TEST_CUDA_MAJOR}.txt" - ${SANITIZER_CMD} pytest -rxXs -v tests/ + ${SANITIZER_CMD} pytest -ra -s -v tests/ # It is a bit convoluted to run the Cython tests against CTK wheels, # so let's just skip them. Also, currently our CI always installs the @@ -265,7 +269,7 @@ jobs: # TODO: enable this once win-64 runners are up exit 1 fi - ${SANITIZER_CMD} pytest -rxXs -v tests/cython + ${SANITIZER_CMD} pytest -ra -s -v tests/cython fi popd diff --git a/.github/workflows/test-wheel-windows.yml b/.github/workflows/test-wheel-windows.yml index 95a9cbe6f..c51f9d439 100644 --- a/.github/workflows/test-wheel-windows.yml +++ b/.github/workflows/test-wheel-windows.yml @@ -65,6 +65,8 @@ jobs: } } + "CUDA_BINDINGS_PATH_FINDER_TEST_ALL_LIBNAMES=1" >> $env:GITHUB_ENV + # Make outputs from the previous job as env vars $CUDA_CORE_ARTIFACT_BASENAME = "cuda-core-python${PYTHON_VERSION_FORMATTED}-${{ inputs.host-platform }}" "PYTHON_VERSION_FORMATTED=${PYTHON_VERSION_FORMATTED}" >> $env:GITHUB_ENV @@ -165,8 +167,7 @@ jobs: uses: Jimver/cuda-toolkit@v0.2.21 with: cuda: ${{ inputs.cuda-version }} - method: 'network' - sub-packages: ${{ env.MINI_CTK_DEPS }} + method: 'local' - name: Update PATH if: ${{ inputs.local-ctk == '1' }} @@ -190,7 +191,7 @@ jobs: Push-Location ./cuda_bindings pip install -r requirements.txt - pytest -rxXs -v tests/ + pytest -ra -s -v tests/ # skip Cython tests for now (NVIDIA/cuda-python#466) Pop-Location @@ -214,7 +215,7 @@ jobs: Push-Location ./cuda_core pip install -r "tests/requirements-cu${TEST_CUDA_MAJOR}.txt" - pytest -rxXs -v tests/ + pytest -ra -s -v tests/ Pop-Location - name: Ensure cuda-python installable diff --git a/cuda_bindings/pyproject.toml b/cuda_bindings/pyproject.toml index 875547033..48186137f 100644 --- a/cuda_bindings/pyproject.toml +++ b/cuda_bindings/pyproject.toml @@ -42,6 +42,16 @@ all = [ "nvidia-cuda-nvcc-cu12", "nvidia-cuda-nvrtc-cu12", "nvidia-nvjitlink-cu12>=12.3", + "nvidia-cuda-runtime-cu12", + "nvidia-cublas-cu12", + "nvidia-cufft-cu12", + "nvidia-curand-cu12", + "nvidia-cusolver-cu12", + "nvidia-cusparse-cu12", + "nvidia-npp-cu12", + "nvidia-nvjpeg-cu12", + "nvidia-nvfatbin-cu12", + "nvidia-cufile-cu12; sys_platform != 'win32'", ] [project.urls] From 1b88ec27fa0f714643a139670aefabcaf89ff1b6 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Wed, 30 Apr 2025 07:57:28 -0700 Subject: [PATCH 32/52] Restore cuda_path.py AS-IT-WAS at commit 2279bda65640b73a9a5632df878f52aedcbbd642 --- .../cuda/bindings/_path_finder/cuda_paths.py | 314 +++++------------- 1 file changed, 75 insertions(+), 239 deletions(-) diff --git a/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py b/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py index 9d441a4d3..d3f266521 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py +++ b/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py @@ -1,40 +1,9 @@ # Copyright 2025 NVIDIA Corporation. All rights reserved. # SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE -# Forked from: -# https://github.com/NVIDIA/numba-cuda/blob/8c9c9d0cb901c06774a9abea6d12b6a4b0287e5e/numba_cuda/numba/cuda/cuda_paths.py - -# The numba-cuda version in turn was forked from: -# https://github.com/numba/numba/blob/6c8a71ffc3eaa1c68e1bac927b80ee7469002b3f/numba/cuda/cuda_paths.py -# SPDX-License-Identifier: BSD-2-Clause -# -# Original Numba LICENSE: -# Copyright (c) 2012, Anaconda, Inc. -# All rights reserved. -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions are -# met: -# -# Redistributions of source code must retain the above copyright notice, -# this list of conditions and the following disclaimer. -# -# Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions and the following disclaimer in the -# documentation and/or other materials provided with the distribution. -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT -# HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -import ctypes +# Forked from then pending https://github.com/NVIDIA/numba-cuda/pull/155 on 2025-03-19: +# https://github.com/brandon-b-miller/numba-cuda/blob/d4bf1137e6fcc65f6e2ba2be6f723e69b357b798/numba_cuda/numba/cuda/cuda_paths.py + import os import platform import re @@ -45,7 +14,7 @@ from collections import namedtuple from pathlib import Path -from .findlib import find_lib +from .findlib import find_file, find_lib IS_WIN32 = sys.platform.startswith("win32") @@ -85,37 +54,6 @@ def _readenv(name, ctor, default): config_CUDA_INCLUDE_PATH = _get_numba_CUDA_INCLUDE_PATH() -SEARCH_PRIORITY = [ - "Conda environment", - "Conda environment (NVIDIA package)", - "NVIDIA NVCC Wheel", - "CUDA_HOME", - "System", - "Debian package", -] - - -def _priority_index(label): - if label in SEARCH_PRIORITY: - return SEARCH_PRIORITY.index(label) - else: - raise ValueError(f"Can't determine search priority for {label}") - - -def _find_first_valid_lazy(options): - sorted_options = sorted(options, key=lambda x: _priority_index(x[0])) - for label, fn in sorted_options: - value = fn() - if value: - return label, value - return "", None - - -def _build_options(pairs): - """Sorts and returns a list of (label, value) tuples according to SEARCH_PRIORITY.""" - priority_index = {label: i for i, label in enumerate(SEARCH_PRIORITY)} - return sorted(pairs, key=lambda pair: priority_index.get(pair[0], float("inf"))) - def _find_valid_path(options): """Find valid path from *options*, which is a list of 2-tuple of @@ -130,17 +68,19 @@ def _find_valid_path(options): def _get_libdevice_path_decision(): - options = _build_options( - [ - ("Conda environment", get_conda_ctk), - ("Conda environment (NVIDIA package)", get_nvidia_libdevice_ctk), - ("CUDA_HOME", lambda: get_cuda_home("nvvm", "libdevice")), - ("NVIDIA NVCC Wheel", get_libdevice_wheel), - ("System", lambda: get_system_ctk("nvvm", "libdevice")), - ("Debian package", get_debian_pkg_libdevice), - ] - ) - return _find_first_valid_lazy(options) + options = [ + ("Conda environment", get_conda_ctk()), + ("Conda environment (NVIDIA package)", get_nvidia_libdevice_ctk()), + ("CUDA_HOME", get_cuda_home("nvvm", "libdevice")), + ("Debian package", get_debian_pkg_libdevice()), + ("NVIDIA NVCC Wheel", get_libdevice_wheel()), + ] + libdevice_ctk_dir = get_system_ctk("nvvm", "libdevice") + if libdevice_ctk_dir and os.path.exists(libdevice_ctk_dir): + options.append(("System", libdevice_ctk_dir)) + + by, libdir = _find_valid_path(options) + return by, libdir def _nvvm_lib_dir(): @@ -152,113 +92,53 @@ def _nvvm_lib_dir(): def _get_nvvm_path_decision(): options = [ - ("Conda environment", get_conda_ctk), - ("Conda environment (NVIDIA package)", get_nvidia_nvvm_ctk), - ("NVIDIA NVCC Wheel", _get_nvvm_wheel), - ("CUDA_HOME", lambda: get_cuda_home(*_nvvm_lib_dir())), - ("System", lambda: get_system_ctk(*_nvvm_lib_dir())), + ("Conda environment", get_conda_ctk()), + ("Conda environment (NVIDIA package)", get_nvidia_nvvm_ctk()), + ("CUDA_HOME", get_cuda_home(*_nvvm_lib_dir())), + ("NVIDIA NVCC Wheel", _get_nvvm_wheel()), ] - return _find_first_valid_lazy(options) - - -def _get_nvrtc_system_ctk(): - sys_path = get_system_ctk("bin" if IS_WIN32 else "lib64") - candidates = find_lib("nvrtc", sys_path) - if candidates: - return max(candidates) + # need to ensure nvvm dir actually exists + nvvm_ctk_dir = get_system_ctk(*_nvvm_lib_dir()) + if nvvm_ctk_dir and os.path.exists(nvvm_ctk_dir): + options.append(("System", nvvm_ctk_dir)) - -def _get_nvrtc_path_decision(): - options = _build_options( - [ - ("CUDA_HOME", lambda: get_cuda_home("nvrtc")), - ("Conda environment", get_conda_ctk), - ("Conda environment (NVIDIA package)", get_nvidia_cudalib_ctk), - ("NVIDIA NVCC Wheel", _get_nvrtc_wheel), - ("System", _get_nvrtc_system_ctk), - ] - ) - return _find_first_valid_lazy(options) + by, path = _find_valid_path(options) + return by, path def _get_nvvm_wheel(): - platform_map = { - "linux": ("lib64", "libnvvm.so"), - "win32": ("bin", "nvvm64_40_0.dll"), - } - - for plat, (dso_dir, dso_path) in platform_map.items(): - if sys.platform.startswith(plat): - break - else: - raise NotImplementedError("Unsupported platform") - - site_paths = [site.getusersitepackages()] + site.getsitepackages() - - for sp in filter(None, site_paths): - nvvm_path = Path(sp, "nvidia", "cuda_nvcc", "nvvm", dso_dir, dso_path) - if nvvm_path.exists(): - return str(nvvm_path.parent) - - return None - - -def get_nvrtc_dso_path(): - site_paths = [site.getusersitepackages()] + site.getsitepackages() + site_paths = [site.getusersitepackages()] + site.getsitepackages() + ["conda", None] for sp in site_paths: - lib_dir = os.path.join( - sp, - "nvidia", - "cuda_nvrtc", - ("bin" if IS_WIN32 else "lib") if sp else None, - ) - if lib_dir and os.path.exists(lib_dir): - for major in (12, 11): - if major == 11: - cu_ver = "112" if IS_WIN32 else "11.2" - elif major == 12: - cu_ver = "120" if IS_WIN32 else "12" - else: - raise NotImplementedError(f"CUDA {major} is not supported") - - dso_path = os.path.join( - lib_dir, - f"nvrtc64_{cu_ver}_0.dll" if IS_WIN32 else f"libnvrtc.so.{cu_ver}", - ) - if os.path.isfile(dso_path): - return dso_path - return None - - -def _get_nvrtc_wheel(): - dso_path = get_nvrtc_dso_path() - if dso_path: - try: - result = ctypes.CDLL(dso_path, mode=ctypes.RTLD_GLOBAL) - except OSError: - pass + # The SONAME is taken based on public CTK 12.x releases + if sys.platform.startswith("linux"): + dso_dir = "lib64" + # Hack: libnvvm from Linux wheel + # does not have any soname (CUDAINST-3183) + dso_path = "libnvvm.so" + elif sys.platform.startswith("win32"): + dso_dir = "bin" + dso_path = "nvvm64_40_0.dll" else: - if IS_WIN32: - import win32api - - # This absolute path will - # always be correct regardless of the package source - nvrtc_path = win32api.GetModuleFileNameW(result._handle) - dso_dir = os.path.dirname(nvrtc_path) - builtins_path = os.path.join( - dso_dir, - [f for f in os.listdir(dso_dir) if re.match("^nvrtc-builtins.*.dll$", f)][0], - ) - if not os.path.exists(builtins_path): - raise RuntimeError(f'Path does not exist: "{builtins_path}"') - return Path(dso_path) + raise AssertionError() + + if sp is not None: + dso_dir = os.path.join(sp, "nvidia", "cuda_nvcc", "nvvm", dso_dir) + dso_path = os.path.join(dso_dir, dso_path) + if os.path.exists(dso_path): + return str(Path(dso_path).parent) def _get_libdevice_paths(): by, libdir = _get_libdevice_path_decision() - if not libdir: - return _env_path_tuple(by, None) - out = os.path.join(libdir, "libdevice.10.bc") + if by == "NVIDIA NVCC Wheel": + # The NVVM path is a directory, not a file + out = os.path.join(libdir, "libdevice.10.bc") + else: + # Search for pattern + pat = r"libdevice(\.\d+)*\.bc$" + candidates = find_file(re.compile(pat), libdir) + # Keep only the max (most recent version) of the bitcode files. + out = max(candidates, default=None) return _env_path_tuple(by, out) @@ -276,46 +156,26 @@ def _cuda_home_static_cudalib_path(): return ("lib64",) -def _get_cudalib_wheel(): - """Get the cudalib path from the NVCC wheel.""" - site_paths = [site.getusersitepackages()] + site.getsitepackages() - libdir = "bin" if IS_WIN32 else "lib" - for sp in filter(None, site_paths): - cudalib_path = Path(sp, "nvidia", "cuda_runtime", libdir) - if cudalib_path.exists(): - return str(cudalib_path) - return None - - def _get_cudalib_dir_path_decision(): - options = _build_options( - [ - ("Conda environment", get_conda_ctk), - ("Conda environment (NVIDIA package)", get_nvidia_cudalib_ctk), - ("NVIDIA NVCC Wheel", _get_cudalib_wheel), - ("CUDA_HOME", lambda: get_cuda_home(_cudalib_path())), - ("System", lambda: get_system_ctk(_cudalib_path())), - ] - ) - return _find_first_valid_lazy(options) + options = [ + ("Conda environment", get_conda_ctk()), + ("Conda environment (NVIDIA package)", get_nvidia_cudalib_ctk()), + ("CUDA_HOME", get_cuda_home(_cudalib_path())), + ("System", get_system_ctk(_cudalib_path())), + ] + by, libdir = _find_valid_path(options) + return by, libdir def _get_static_cudalib_dir_path_decision(): - options = _build_options( - [ - ("Conda environment", get_conda_ctk), - ( - "Conda environment (NVIDIA package)", - get_nvidia_static_cudalib_ctk, - ), - ( - "CUDA_HOME", - lambda: get_cuda_home(*_cuda_home_static_cudalib_path()), - ), - ("System", lambda: get_system_ctk(_cudalib_path())), - ] - ) - return _find_first_valid_lazy(options) + options = [ + ("Conda environment", get_conda_ctk()), + ("Conda environment (NVIDIA package)", get_nvidia_static_cudalib_ctk()), + ("CUDA_HOME", get_cuda_home(*_cuda_home_static_cudalib_path())), + ("System", get_system_ctk(_cudalib_path())), + ] + by, libdir = _find_valid_path(options) + return by, libdir def _get_cudalib_dir(): @@ -331,12 +191,12 @@ def _get_static_cudalib_dir(): def get_system_ctk(*subdirs): """Return path to system-wide cudatoolkit; or, None if it doesn't exist.""" # Linux? - if not IS_WIN32: + if sys.platform.startswith("linux"): # Is cuda alias to /usr/local/cuda? # We are intentionally not getting versioned cuda installation. - result = os.path.join("/usr/local/cuda", *subdirs) - if os.path.exists(result): - return result + base = "/usr/local/cuda" + if os.path.exists(base): + return os.path.join(base, *subdirs) def get_conda_ctk(): @@ -429,38 +289,15 @@ def get_cuda_home(*subdirs): def _get_nvvm_path(): by, path = _get_nvvm_path_decision() - if by == "NVIDIA NVCC Wheel": - platform_map = { - "linux": "libnvvm.so", - "win32": "nvvm64_40_0.dll", - } - - for plat, dso_name in platform_map.items(): - if sys.platform.startswith(plat): - break - else: - raise NotImplementedError("Unsupported platform") - - path = os.path.join(path, dso_name) + # The NVVM path is a directory, not a file + path = os.path.join(path, "libnvvm.so") else: candidates = find_lib("nvvm", path) path = max(candidates) if candidates else None return _env_path_tuple(by, path) -def _get_nvrtc_path(): - by, path = _get_nvrtc_path_decision() - if by == "NVIDIA NVCC Wheel": - path = str(path) - elif by == "System": - return _env_path_tuple(by, path) - else: - candidates = find_lib("nvrtc", path) - path = max(candidates) if candidates else None - return _env_path_tuple(by, path) - - def get_cuda_paths(): """Returns a dictionary mapping component names to a 2-tuple of (source_variable, info). @@ -479,7 +316,6 @@ def get_cuda_paths(): # Not in cache d = { "nvvm": _get_nvvm_path(), - "nvrtc": _get_nvrtc_path(), "libdevice": _get_libdevice_paths(), "cudalib_dir": _get_cudalib_dir(), "static_cudalib_dir": _get_static_cudalib_dir(), From db79ec3240e5d33154d5a5b5775646b7fdc79da1 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Wed, 30 Apr 2025 08:41:38 -0700 Subject: [PATCH 33/52] Revert "Restore cuda_path.py AS-IT-WAS at commit 2279bda65640b73a9a5632df878f52aedcbbd642" This reverts commit 1b88ec27fa0f714643a139670aefabcaf89ff1b6. --- .../cuda/bindings/_path_finder/cuda_paths.py | 314 +++++++++++++----- 1 file changed, 239 insertions(+), 75 deletions(-) diff --git a/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py b/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py index d3f266521..9d441a4d3 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py +++ b/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py @@ -1,9 +1,40 @@ # Copyright 2025 NVIDIA Corporation. All rights reserved. # SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE -# Forked from then pending https://github.com/NVIDIA/numba-cuda/pull/155 on 2025-03-19: -# https://github.com/brandon-b-miller/numba-cuda/blob/d4bf1137e6fcc65f6e2ba2be6f723e69b357b798/numba_cuda/numba/cuda/cuda_paths.py - +# Forked from: +# https://github.com/NVIDIA/numba-cuda/blob/8c9c9d0cb901c06774a9abea6d12b6a4b0287e5e/numba_cuda/numba/cuda/cuda_paths.py + +# The numba-cuda version in turn was forked from: +# https://github.com/numba/numba/blob/6c8a71ffc3eaa1c68e1bac927b80ee7469002b3f/numba/cuda/cuda_paths.py +# SPDX-License-Identifier: BSD-2-Clause +# +# Original Numba LICENSE: +# Copyright (c) 2012, Anaconda, Inc. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# Redistributions of source code must retain the above copyright notice, +# this list of conditions and the following disclaimer. +# +# Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +import ctypes import os import platform import re @@ -14,7 +45,7 @@ from collections import namedtuple from pathlib import Path -from .findlib import find_file, find_lib +from .findlib import find_lib IS_WIN32 = sys.platform.startswith("win32") @@ -54,6 +85,37 @@ def _readenv(name, ctor, default): config_CUDA_INCLUDE_PATH = _get_numba_CUDA_INCLUDE_PATH() +SEARCH_PRIORITY = [ + "Conda environment", + "Conda environment (NVIDIA package)", + "NVIDIA NVCC Wheel", + "CUDA_HOME", + "System", + "Debian package", +] + + +def _priority_index(label): + if label in SEARCH_PRIORITY: + return SEARCH_PRIORITY.index(label) + else: + raise ValueError(f"Can't determine search priority for {label}") + + +def _find_first_valid_lazy(options): + sorted_options = sorted(options, key=lambda x: _priority_index(x[0])) + for label, fn in sorted_options: + value = fn() + if value: + return label, value + return "", None + + +def _build_options(pairs): + """Sorts and returns a list of (label, value) tuples according to SEARCH_PRIORITY.""" + priority_index = {label: i for i, label in enumerate(SEARCH_PRIORITY)} + return sorted(pairs, key=lambda pair: priority_index.get(pair[0], float("inf"))) + def _find_valid_path(options): """Find valid path from *options*, which is a list of 2-tuple of @@ -68,19 +130,17 @@ def _find_valid_path(options): def _get_libdevice_path_decision(): - options = [ - ("Conda environment", get_conda_ctk()), - ("Conda environment (NVIDIA package)", get_nvidia_libdevice_ctk()), - ("CUDA_HOME", get_cuda_home("nvvm", "libdevice")), - ("Debian package", get_debian_pkg_libdevice()), - ("NVIDIA NVCC Wheel", get_libdevice_wheel()), - ] - libdevice_ctk_dir = get_system_ctk("nvvm", "libdevice") - if libdevice_ctk_dir and os.path.exists(libdevice_ctk_dir): - options.append(("System", libdevice_ctk_dir)) - - by, libdir = _find_valid_path(options) - return by, libdir + options = _build_options( + [ + ("Conda environment", get_conda_ctk), + ("Conda environment (NVIDIA package)", get_nvidia_libdevice_ctk), + ("CUDA_HOME", lambda: get_cuda_home("nvvm", "libdevice")), + ("NVIDIA NVCC Wheel", get_libdevice_wheel), + ("System", lambda: get_system_ctk("nvvm", "libdevice")), + ("Debian package", get_debian_pkg_libdevice), + ] + ) + return _find_first_valid_lazy(options) def _nvvm_lib_dir(): @@ -92,53 +152,113 @@ def _nvvm_lib_dir(): def _get_nvvm_path_decision(): options = [ - ("Conda environment", get_conda_ctk()), - ("Conda environment (NVIDIA package)", get_nvidia_nvvm_ctk()), - ("CUDA_HOME", get_cuda_home(*_nvvm_lib_dir())), - ("NVIDIA NVCC Wheel", _get_nvvm_wheel()), + ("Conda environment", get_conda_ctk), + ("Conda environment (NVIDIA package)", get_nvidia_nvvm_ctk), + ("NVIDIA NVCC Wheel", _get_nvvm_wheel), + ("CUDA_HOME", lambda: get_cuda_home(*_nvvm_lib_dir())), + ("System", lambda: get_system_ctk(*_nvvm_lib_dir())), ] - # need to ensure nvvm dir actually exists - nvvm_ctk_dir = get_system_ctk(*_nvvm_lib_dir()) - if nvvm_ctk_dir and os.path.exists(nvvm_ctk_dir): - options.append(("System", nvvm_ctk_dir)) + return _find_first_valid_lazy(options) + + +def _get_nvrtc_system_ctk(): + sys_path = get_system_ctk("bin" if IS_WIN32 else "lib64") + candidates = find_lib("nvrtc", sys_path) + if candidates: + return max(candidates) - by, path = _find_valid_path(options) - return by, path + +def _get_nvrtc_path_decision(): + options = _build_options( + [ + ("CUDA_HOME", lambda: get_cuda_home("nvrtc")), + ("Conda environment", get_conda_ctk), + ("Conda environment (NVIDIA package)", get_nvidia_cudalib_ctk), + ("NVIDIA NVCC Wheel", _get_nvrtc_wheel), + ("System", _get_nvrtc_system_ctk), + ] + ) + return _find_first_valid_lazy(options) def _get_nvvm_wheel(): - site_paths = [site.getusersitepackages()] + site.getsitepackages() + ["conda", None] + platform_map = { + "linux": ("lib64", "libnvvm.so"), + "win32": ("bin", "nvvm64_40_0.dll"), + } + + for plat, (dso_dir, dso_path) in platform_map.items(): + if sys.platform.startswith(plat): + break + else: + raise NotImplementedError("Unsupported platform") + + site_paths = [site.getusersitepackages()] + site.getsitepackages() + + for sp in filter(None, site_paths): + nvvm_path = Path(sp, "nvidia", "cuda_nvcc", "nvvm", dso_dir, dso_path) + if nvvm_path.exists(): + return str(nvvm_path.parent) + + return None + + +def get_nvrtc_dso_path(): + site_paths = [site.getusersitepackages()] + site.getsitepackages() for sp in site_paths: - # The SONAME is taken based on public CTK 12.x releases - if sys.platform.startswith("linux"): - dso_dir = "lib64" - # Hack: libnvvm from Linux wheel - # does not have any soname (CUDAINST-3183) - dso_path = "libnvvm.so" - elif sys.platform.startswith("win32"): - dso_dir = "bin" - dso_path = "nvvm64_40_0.dll" + lib_dir = os.path.join( + sp, + "nvidia", + "cuda_nvrtc", + ("bin" if IS_WIN32 else "lib") if sp else None, + ) + if lib_dir and os.path.exists(lib_dir): + for major in (12, 11): + if major == 11: + cu_ver = "112" if IS_WIN32 else "11.2" + elif major == 12: + cu_ver = "120" if IS_WIN32 else "12" + else: + raise NotImplementedError(f"CUDA {major} is not supported") + + dso_path = os.path.join( + lib_dir, + f"nvrtc64_{cu_ver}_0.dll" if IS_WIN32 else f"libnvrtc.so.{cu_ver}", + ) + if os.path.isfile(dso_path): + return dso_path + return None + + +def _get_nvrtc_wheel(): + dso_path = get_nvrtc_dso_path() + if dso_path: + try: + result = ctypes.CDLL(dso_path, mode=ctypes.RTLD_GLOBAL) + except OSError: + pass else: - raise AssertionError() - - if sp is not None: - dso_dir = os.path.join(sp, "nvidia", "cuda_nvcc", "nvvm", dso_dir) - dso_path = os.path.join(dso_dir, dso_path) - if os.path.exists(dso_path): - return str(Path(dso_path).parent) + if IS_WIN32: + import win32api + + # This absolute path will + # always be correct regardless of the package source + nvrtc_path = win32api.GetModuleFileNameW(result._handle) + dso_dir = os.path.dirname(nvrtc_path) + builtins_path = os.path.join( + dso_dir, + [f for f in os.listdir(dso_dir) if re.match("^nvrtc-builtins.*.dll$", f)][0], + ) + if not os.path.exists(builtins_path): + raise RuntimeError(f'Path does not exist: "{builtins_path}"') + return Path(dso_path) def _get_libdevice_paths(): by, libdir = _get_libdevice_path_decision() - if by == "NVIDIA NVCC Wheel": - # The NVVM path is a directory, not a file - out = os.path.join(libdir, "libdevice.10.bc") - else: - # Search for pattern - pat = r"libdevice(\.\d+)*\.bc$" - candidates = find_file(re.compile(pat), libdir) - # Keep only the max (most recent version) of the bitcode files. - out = max(candidates, default=None) + if not libdir: + return _env_path_tuple(by, None) + out = os.path.join(libdir, "libdevice.10.bc") return _env_path_tuple(by, out) @@ -156,26 +276,46 @@ def _cuda_home_static_cudalib_path(): return ("lib64",) +def _get_cudalib_wheel(): + """Get the cudalib path from the NVCC wheel.""" + site_paths = [site.getusersitepackages()] + site.getsitepackages() + libdir = "bin" if IS_WIN32 else "lib" + for sp in filter(None, site_paths): + cudalib_path = Path(sp, "nvidia", "cuda_runtime", libdir) + if cudalib_path.exists(): + return str(cudalib_path) + return None + + def _get_cudalib_dir_path_decision(): - options = [ - ("Conda environment", get_conda_ctk()), - ("Conda environment (NVIDIA package)", get_nvidia_cudalib_ctk()), - ("CUDA_HOME", get_cuda_home(_cudalib_path())), - ("System", get_system_ctk(_cudalib_path())), - ] - by, libdir = _find_valid_path(options) - return by, libdir + options = _build_options( + [ + ("Conda environment", get_conda_ctk), + ("Conda environment (NVIDIA package)", get_nvidia_cudalib_ctk), + ("NVIDIA NVCC Wheel", _get_cudalib_wheel), + ("CUDA_HOME", lambda: get_cuda_home(_cudalib_path())), + ("System", lambda: get_system_ctk(_cudalib_path())), + ] + ) + return _find_first_valid_lazy(options) def _get_static_cudalib_dir_path_decision(): - options = [ - ("Conda environment", get_conda_ctk()), - ("Conda environment (NVIDIA package)", get_nvidia_static_cudalib_ctk()), - ("CUDA_HOME", get_cuda_home(*_cuda_home_static_cudalib_path())), - ("System", get_system_ctk(_cudalib_path())), - ] - by, libdir = _find_valid_path(options) - return by, libdir + options = _build_options( + [ + ("Conda environment", get_conda_ctk), + ( + "Conda environment (NVIDIA package)", + get_nvidia_static_cudalib_ctk, + ), + ( + "CUDA_HOME", + lambda: get_cuda_home(*_cuda_home_static_cudalib_path()), + ), + ("System", lambda: get_system_ctk(_cudalib_path())), + ] + ) + return _find_first_valid_lazy(options) def _get_cudalib_dir(): @@ -191,12 +331,12 @@ def _get_static_cudalib_dir(): def get_system_ctk(*subdirs): """Return path to system-wide cudatoolkit; or, None if it doesn't exist.""" # Linux? - if sys.platform.startswith("linux"): + if not IS_WIN32: # Is cuda alias to /usr/local/cuda? # We are intentionally not getting versioned cuda installation. - base = "/usr/local/cuda" - if os.path.exists(base): - return os.path.join(base, *subdirs) + result = os.path.join("/usr/local/cuda", *subdirs) + if os.path.exists(result): + return result def get_conda_ctk(): @@ -289,15 +429,38 @@ def get_cuda_home(*subdirs): def _get_nvvm_path(): by, path = _get_nvvm_path_decision() + if by == "NVIDIA NVCC Wheel": - # The NVVM path is a directory, not a file - path = os.path.join(path, "libnvvm.so") + platform_map = { + "linux": "libnvvm.so", + "win32": "nvvm64_40_0.dll", + } + + for plat, dso_name in platform_map.items(): + if sys.platform.startswith(plat): + break + else: + raise NotImplementedError("Unsupported platform") + + path = os.path.join(path, dso_name) else: candidates = find_lib("nvvm", path) path = max(candidates) if candidates else None return _env_path_tuple(by, path) +def _get_nvrtc_path(): + by, path = _get_nvrtc_path_decision() + if by == "NVIDIA NVCC Wheel": + path = str(path) + elif by == "System": + return _env_path_tuple(by, path) + else: + candidates = find_lib("nvrtc", path) + path = max(candidates) if candidates else None + return _env_path_tuple(by, path) + + def get_cuda_paths(): """Returns a dictionary mapping component names to a 2-tuple of (source_variable, info). @@ -316,6 +479,7 @@ def get_cuda_paths(): # Not in cache d = { "nvvm": _get_nvvm_path(), + "nvrtc": _get_nvrtc_path(), "libdevice": _get_libdevice_paths(), "cudalib_dir": _get_cudalib_dir(), "static_cudalib_dir": _get_static_cudalib_dir(), From 2bc7ef61632b0d88b6875f7371da99c3a49c4525 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Wed, 30 Apr 2025 08:44:42 -0700 Subject: [PATCH 34/52] Force compute-sanitizer off unconditionally --- .github/workflows/test-wheel-linux.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/test-wheel-linux.yml b/.github/workflows/test-wheel-linux.yml index 1e2a33400..af5c1fd45 100644 --- a/.github/workflows/test-wheel-linux.yml +++ b/.github/workflows/test-wheel-linux.yml @@ -193,7 +193,7 @@ jobs: # We don't test compute-sanitizer on CTK<12 because backporting fixes is too much effort # We only test compute-sanitizer on python 3.12 arbitrarily; we don't need to use sanitizer on the entire matrix # Only local ctk installs have compute-sanitizer; there is not wheel for it - if [[ "${{ inputs.python-version }}" == "3.12" && "${{ inputs.cuda-version }}" != "11.8.0" && "${{ inputs.local-ctk }}" == 1 ]]; then + if [[ "${{ inputs.python-version }}" == "9.99" && "${{ inputs.cuda-version }}" != "11.8.0" && "${{ inputs.local-ctk }}" == 1 ]]; then COMPUTE_SANITIZER="${CUDA_HOME}/bin/compute-sanitizer" COMPUTE_SANITIZER_VERSION=$(${COMPUTE_SANITIZER} --version | grep -Eo "[0-9]{4}\.[0-9]\.[0-9]" | sed -e 's/\.//g') SANITIZER_CMD="${COMPUTE_SANITIZER} --target-processes=all --launch-timeout=0 --tool=memcheck --error-exitcode=1" From 7650b2e0e04e0ad0fe18a8ce00cbc18425cb4a69 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Wed, 30 Apr 2025 09:25:20 -0700 Subject: [PATCH 35/52] Revert "Force compute-sanitizer off unconditionally" This reverts commit 2bc7ef61632b0d88b6875f7371da99c3a49c4525. --- .github/workflows/test-wheel-linux.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/test-wheel-linux.yml b/.github/workflows/test-wheel-linux.yml index af5c1fd45..1e2a33400 100644 --- a/.github/workflows/test-wheel-linux.yml +++ b/.github/workflows/test-wheel-linux.yml @@ -193,7 +193,7 @@ jobs: # We don't test compute-sanitizer on CTK<12 because backporting fixes is too much effort # We only test compute-sanitizer on python 3.12 arbitrarily; we don't need to use sanitizer on the entire matrix # Only local ctk installs have compute-sanitizer; there is not wheel for it - if [[ "${{ inputs.python-version }}" == "9.99" && "${{ inputs.cuda-version }}" != "11.8.0" && "${{ inputs.local-ctk }}" == 1 ]]; then + if [[ "${{ inputs.python-version }}" == "3.12" && "${{ inputs.cuda-version }}" != "11.8.0" && "${{ inputs.local-ctk }}" == 1 ]]; then COMPUTE_SANITIZER="${CUDA_HOME}/bin/compute-sanitizer" COMPUTE_SANITIZER_VERSION=$(${COMPUTE_SANITIZER} --version | grep -Eo "[0-9]{4}\.[0-9]\.[0-9]" | sed -e 's/\.//g') SANITIZER_CMD="${COMPUTE_SANITIZER} --target-processes=all --launch-timeout=0 --tool=memcheck --error-exitcode=1" From b79e85ba63f15a9f466457679961fd955763aae2 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Wed, 30 Apr 2025 09:20:04 -0700 Subject: [PATCH 36/52] Add timeout=10 seconds to test_path_finder.py subprocess.run() invocations. --- cuda_bindings/tests/test_path_finder.py | 1 + 1 file changed, 1 insertion(+) diff --git a/cuda_bindings/tests/test_path_finder.py b/cuda_bindings/tests/test_path_finder.py index 65aa86f4f..964e708dc 100644 --- a/cuda_bindings/tests/test_path_finder.py +++ b/cuda_bindings/tests/test_path_finder.py @@ -88,6 +88,7 @@ def test_find_or_load_nvidia_dynamic_library(info_summary_append, api, libname): stdout=subprocess.PIPE, stderr=subprocess.PIPE, encoding="utf-8", + timeout=10, # Ensure CI testing does not hang for an excessive amount of time. ) if result.returncode == 0: info_summary_append(f"abs_path={result.stdout.rstrip()}") From f9a9e9f3aa9094c189e1e7419698751271e0c8e5 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Wed, 30 Apr 2025 09:58:25 -0700 Subject: [PATCH 37/52] Increase test_path_finder.py subprocess.run() timeout to 30 seconds: Under Windows, loading cublas or cusolver may exceed the 10 second timeout: https://github.com/NVIDIA/cuda-python/pull/578#issuecomment-2842638872 --- cuda_bindings/tests/test_path_finder.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cuda_bindings/tests/test_path_finder.py b/cuda_bindings/tests/test_path_finder.py index 964e708dc..2a5f887fd 100644 --- a/cuda_bindings/tests/test_path_finder.py +++ b/cuda_bindings/tests/test_path_finder.py @@ -88,7 +88,7 @@ def test_find_or_load_nvidia_dynamic_library(info_summary_append, api, libname): stdout=subprocess.PIPE, stderr=subprocess.PIPE, encoding="utf-8", - timeout=10, # Ensure CI testing does not hang for an excessive amount of time. + timeout=30, # Ensure CI testing does not hang for an excessive amount of time. ) if result.returncode == 0: info_summary_append(f"abs_path={result.stdout.rstrip()}") From 7f76683605ee8c85be9c43b37ec00addb775babb Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Wed, 30 Apr 2025 10:59:00 -0700 Subject: [PATCH 38/52] Revert "Temporarily restore debug changes under .github/workflows, for expanded path_finder test coverage" This reverts commit 47ad79f317c30423a023bfef28f68163360728b6. --- .github/workflows/build-and-test.yml | 10 ++++++++++ .github/workflows/test-wheel-linux.yml | 12 ++++-------- .github/workflows/test-wheel-windows.yml | 9 ++++----- cuda_bindings/pyproject.toml | 10 ---------- 4 files changed, 18 insertions(+), 23 deletions(-) diff --git a/.github/workflows/build-and-test.yml b/.github/workflows/build-and-test.yml index 6ece1b235..a00a94028 100644 --- a/.github/workflows/build-and-test.yml +++ b/.github/workflows/build-and-test.yml @@ -25,9 +25,14 @@ jobs: matrix: host-platform: - linux-64 + - linux-aarch64 - win-64 python-version: + - "3.13" - "3.12" + - "3.11" + - "3.10" + - "3.9" cuda-version: # Note: this is for build-time only. - "12.8.0" @@ -210,8 +215,13 @@ jobs: matrix: host-platform: - linux-64 + - linux-aarch64 python-version: + - "3.13" - "3.12" + - "3.11" + - "3.10" + - "3.9" cuda-version: # Note: this is for test-time only. - "12.8.0" diff --git a/.github/workflows/test-wheel-linux.yml b/.github/workflows/test-wheel-linux.yml index 1e2a33400..83dad5cec 100644 --- a/.github/workflows/test-wheel-linux.yml +++ b/.github/workflows/test-wheel-linux.yml @@ -78,10 +78,6 @@ jobs: fi fi - if [[ "${{ inputs.local-ctk }}" != 1 ]]; then - echo "CUDA_BINDINGS_PATH_FINDER_TEST_ALL_LIBNAMES=1" >> $GITHUB_ENV - fi - # make outputs from the previous job as env vars CUDA_CORE_ARTIFACT_BASENAME="cuda-core-python${PYTHON_VERSION_FORMATTED}-${{ inputs.host-platform }}" echo "PYTHON_VERSION_FORMATTED=${PYTHON_VERSION_FORMATTED}" >> $GITHUB_ENV @@ -220,7 +216,7 @@ jobs: pushd ./cuda_bindings pip install -r requirements.txt - ${SANITIZER_CMD} pytest -ra -s -v tests/ + ${SANITIZER_CMD} pytest -rxXs -v tests/ # It is a bit convoluted to run the Cython tests against CTK wheels, # so let's just skip them. @@ -231,7 +227,7 @@ jobs: # TODO: enable this once win-64 runners are up exit 1 fi - ${SANITIZER_CMD} pytest -ra -s -v tests/cython + ${SANITIZER_CMD} pytest -rxXs -v tests/cython fi popd @@ -255,7 +251,7 @@ jobs: pushd ./cuda_core pip install -r "tests/requirements-cu${TEST_CUDA_MAJOR}.txt" - ${SANITIZER_CMD} pytest -ra -s -v tests/ + ${SANITIZER_CMD} pytest -rxXs -v tests/ # It is a bit convoluted to run the Cython tests against CTK wheels, # so let's just skip them. Also, currently our CI always installs the @@ -269,7 +265,7 @@ jobs: # TODO: enable this once win-64 runners are up exit 1 fi - ${SANITIZER_CMD} pytest -ra -s -v tests/cython + ${SANITIZER_CMD} pytest -rxXs -v tests/cython fi popd diff --git a/.github/workflows/test-wheel-windows.yml b/.github/workflows/test-wheel-windows.yml index c51f9d439..95a9cbe6f 100644 --- a/.github/workflows/test-wheel-windows.yml +++ b/.github/workflows/test-wheel-windows.yml @@ -65,8 +65,6 @@ jobs: } } - "CUDA_BINDINGS_PATH_FINDER_TEST_ALL_LIBNAMES=1" >> $env:GITHUB_ENV - # Make outputs from the previous job as env vars $CUDA_CORE_ARTIFACT_BASENAME = "cuda-core-python${PYTHON_VERSION_FORMATTED}-${{ inputs.host-platform }}" "PYTHON_VERSION_FORMATTED=${PYTHON_VERSION_FORMATTED}" >> $env:GITHUB_ENV @@ -167,7 +165,8 @@ jobs: uses: Jimver/cuda-toolkit@v0.2.21 with: cuda: ${{ inputs.cuda-version }} - method: 'local' + method: 'network' + sub-packages: ${{ env.MINI_CTK_DEPS }} - name: Update PATH if: ${{ inputs.local-ctk == '1' }} @@ -191,7 +190,7 @@ jobs: Push-Location ./cuda_bindings pip install -r requirements.txt - pytest -ra -s -v tests/ + pytest -rxXs -v tests/ # skip Cython tests for now (NVIDIA/cuda-python#466) Pop-Location @@ -215,7 +214,7 @@ jobs: Push-Location ./cuda_core pip install -r "tests/requirements-cu${TEST_CUDA_MAJOR}.txt" - pytest -ra -s -v tests/ + pytest -rxXs -v tests/ Pop-Location - name: Ensure cuda-python installable diff --git a/cuda_bindings/pyproject.toml b/cuda_bindings/pyproject.toml index 48186137f..875547033 100644 --- a/cuda_bindings/pyproject.toml +++ b/cuda_bindings/pyproject.toml @@ -42,16 +42,6 @@ all = [ "nvidia-cuda-nvcc-cu12", "nvidia-cuda-nvrtc-cu12", "nvidia-nvjitlink-cu12>=12.3", - "nvidia-cuda-runtime-cu12", - "nvidia-cublas-cu12", - "nvidia-cufft-cu12", - "nvidia-curand-cu12", - "nvidia-cusolver-cu12", - "nvidia-cusparse-cu12", - "nvidia-npp-cu12", - "nvidia-nvjpeg-cu12", - "nvidia-nvfatbin-cu12", - "nvidia-cufile-cu12; sys_platform != 'win32'", ] [project.urls] From aeaf4f02278b62befb0e380e9f6f97a50b848fb3 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Wed, 30 Apr 2025 08:44:42 -0700 Subject: [PATCH 39/52] Force compute-sanitizer off unconditionally --- .github/workflows/test-wheel-linux.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/test-wheel-linux.yml b/.github/workflows/test-wheel-linux.yml index 83dad5cec..705f09be1 100644 --- a/.github/workflows/test-wheel-linux.yml +++ b/.github/workflows/test-wheel-linux.yml @@ -189,7 +189,7 @@ jobs: # We don't test compute-sanitizer on CTK<12 because backporting fixes is too much effort # We only test compute-sanitizer on python 3.12 arbitrarily; we don't need to use sanitizer on the entire matrix # Only local ctk installs have compute-sanitizer; there is not wheel for it - if [[ "${{ inputs.python-version }}" == "3.12" && "${{ inputs.cuda-version }}" != "11.8.0" && "${{ inputs.local-ctk }}" == 1 ]]; then + if [[ "${{ inputs.python-version }}" == "9.99" && "${{ inputs.cuda-version }}" != "11.8.0" && "${{ inputs.local-ctk }}" == 1 ]]; then COMPUTE_SANITIZER="${CUDA_HOME}/bin/compute-sanitizer" COMPUTE_SANITIZER_VERSION=$(${COMPUTE_SANITIZER} --version | grep -Eo "[0-9]{4}\.[0-9]\.[0-9]" | sed -e 's/\.//g') SANITIZER_CMD="${COMPUTE_SANITIZER} --target-processes=all --launch-timeout=0 --tool=memcheck --error-exitcode=1" From 6a60161d8ab6773295cd856d643880624b12d1ec Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Wed, 30 Apr 2025 13:58:45 -0700 Subject: [PATCH 40/52] Add: Note that the search is done on a per-library basis. --- cuda_bindings/cuda/bindings/_path_finder/README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/cuda_bindings/cuda/bindings/_path_finder/README.md b/cuda_bindings/cuda/bindings/_path_finder/README.md index e66668fd6..75fead38b 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/README.md +++ b/cuda_bindings/cuda/bindings/_path_finder/README.md @@ -33,6 +33,9 @@ strategy for locating NVIDIA shared libraries: - `dlopen()` on Linux - `LoadLibraryW()` on Windows +Note that the search is done on a per-library basis. There is no centralized +mechanism that ensures all libraries are found in the same way. + ## Implementation Philosophy The current implementation balances stability and evolution: From 3277ac52e0f79cdd3a1b19d39f828aeb992a2457 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Wed, 30 Apr 2025 14:08:17 -0700 Subject: [PATCH 41/52] Add Note for CUDA_HOME / CUDA_PATH --- cuda_bindings/cuda/bindings/_path_finder/README.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/cuda_bindings/cuda/bindings/_path_finder/README.md b/cuda_bindings/cuda/bindings/_path_finder/README.md index 75fead38b..9fa6f0b30 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/README.md +++ b/cuda_bindings/cuda/bindings/_path_finder/README.md @@ -20,13 +20,15 @@ The `load_nvidia_dynamic_library()` function implements a hierarchical search strategy for locating NVIDIA shared libraries: 1. **Python Package Ecosystem** - - Scans `sys.path` to find libraries installed via NVIDIA Python wheels + - Scans `sys.path` to find libraries installed via NVIDIA Python wheels. 2. **Conda Environments** - - Leverages Conda-specific paths through our fork of `get_cuda_paths()` from Numba + - Leverages Conda-specific paths through our fork of `get_cuda_paths()` from Numba. 3. **System Installations** - - Checks traditional system locations via the same `get_cuda_paths()` implementation + - Checks traditional system locations via the same `get_cuda_paths()` implementation. + — Note that `get_cuda_paths()` references `CUDA_HOME` and `CUDA_PATH`. The existing + mechanism are used as-is (see Implementation Philosophy below). 4. **OS Default Mechanisms** - Falls back to native loader: From 1d4420b32ef73905067394676e4ed542d69e2422 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Wed, 30 Apr 2025 14:17:27 -0700 Subject: [PATCH 42/52] Add 0. **Check if a library was loaded into the process already by some other means.** --- cuda_bindings/cuda/bindings/_path_finder/README.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/cuda_bindings/cuda/bindings/_path_finder/README.md b/cuda_bindings/cuda/bindings/_path_finder/README.md index 9fa6f0b30..b4f73c64a 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/README.md +++ b/cuda_bindings/cuda/bindings/_path_finder/README.md @@ -19,6 +19,11 @@ reasonably well-tested through CI pipelines. The `load_nvidia_dynamic_library()` function implements a hierarchical search strategy for locating NVIDIA shared libraries: +0. **Check if a library was loaded into the process already by some other means.** + - If yes, there is no alternative to skipping the rest of the search logic. + The absolute path of the already loaded library will be returned, along + with the handle to the library. + 1. **Python Package Ecosystem** - Scans `sys.path` to find libraries installed via NVIDIA Python wheels. From 4437fcca99d7f58ebdc519f2afe37d2fdb059bf1 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Wed, 30 Apr 2025 14:30:30 -0700 Subject: [PATCH 43/52] _find_dll_using_nvidia_bin_dirs(): reuse lib_searched_for in place of file_wild --- .../_path_finder/find_nvidia_dynamic_library.py | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py b/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py index a0131a9cd..a04c56dc9 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py +++ b/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py @@ -48,17 +48,16 @@ def _find_dll_under_dir(dirpath, file_wild): return None -def _find_dll_using_nvidia_bin_dirs(libname, error_messages, attachments): +def _find_dll_using_nvidia_bin_dirs(libname, lib_searched_for, error_messages, attachments): if libname == "nvvm": # noqa: SIM108 nvidia_sub_dirs = ("nvidia", "*", "nvvm", "bin") else: nvidia_sub_dirs = ("nvidia", "*", "bin") - file_wild = libname + "*.dll" for bin_dir in sys_path_find_sub_dirs(nvidia_sub_dirs): - dll_name = _find_dll_under_dir(bin_dir, file_wild) + dll_name = _find_dll_under_dir(bin_dir, lib_searched_for) if dll_name is not None: return dll_name - _no_such_file_in_sub_dirs(nvidia_sub_dirs, file_wild, error_messages, attachments) + _no_such_file_in_sub_dirs(nvidia_sub_dirs, lib_searched_for, error_messages, attachments) return None @@ -123,13 +122,15 @@ def __init__(self, libname: str): self.abs_path = None if sys.platform == "win32": - self.abs_path = _find_dll_using_nvidia_bin_dirs(libname, self.error_messages, self.attachments) + self.lib_searched_for = f"{libname}*.dll" + self.abs_path = _find_dll_using_nvidia_bin_dirs( + libname, self.lib_searched_for, self.error_messages, self.attachments + ) if self.abs_path is None: if libname == "nvvm": self.abs_path = _get_cuda_paths_info("nvvm", self.error_messages) else: self.abs_path = _find_dll_using_cudalib_dir(libname, self.error_messages, self.attachments) - self.lib_searched_for = f"{libname}*.dll" else: self.lib_searched_for = f"lib{libname}.so" self.abs_path = _find_so_using_nvidia_lib_dirs( From fd20253b63ca42d82bf7a5785ef1271513f53738 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Wed, 30 Apr 2025 14:40:54 -0700 Subject: [PATCH 44/52] Systematically replace all relative imports with absolute imports. --- cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py | 2 +- .../bindings/_path_finder/find_nvidia_dynamic_library.py | 6 +++--- cuda_bindings/cuda/bindings/_path_finder/load_dl_common.py | 2 +- cuda_bindings/cuda/bindings/_path_finder/load_dl_linux.py | 2 +- cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py | 2 +- .../bindings/_path_finder/load_nvidia_dynamic_library.py | 4 ++-- 6 files changed, 9 insertions(+), 9 deletions(-) diff --git a/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py b/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py index 9d441a4d3..80f4e0149 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py +++ b/cuda_bindings/cuda/bindings/_path_finder/cuda_paths.py @@ -45,7 +45,7 @@ from collections import namedtuple from pathlib import Path -from .findlib import find_lib +from cuda.bindings._path_finder.findlib import find_lib IS_WIN32 = sys.platform.startswith("win32") diff --git a/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py b/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py index a04c56dc9..b735054bf 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py +++ b/cuda_bindings/cuda/bindings/_path_finder/find_nvidia_dynamic_library.py @@ -6,9 +6,9 @@ import os import sys -from .cuda_paths import get_cuda_paths -from .supported_libs import is_suppressed_dll_file -from .sys_path_find_sub_dirs import sys_path_find_sub_dirs +from cuda.bindings._path_finder.cuda_paths import get_cuda_paths +from cuda.bindings._path_finder.supported_libs import is_suppressed_dll_file +from cuda.bindings._path_finder.sys_path_find_sub_dirs import sys_path_find_sub_dirs def _no_such_file_in_sub_dirs(sub_dirs, file_wild, error_messages, attachments): diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_dl_common.py b/cuda_bindings/cuda/bindings/_path_finder/load_dl_common.py index 2b8b1b69f..4592f6c33 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/load_dl_common.py +++ b/cuda_bindings/cuda/bindings/_path_finder/load_dl_common.py @@ -4,7 +4,7 @@ from dataclasses import dataclass from typing import Callable, Optional -from .supported_libs import DIRECT_DEPENDENCIES +from cuda.bindings._path_finder.supported_libs import DIRECT_DEPENDENCIES @dataclass diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_dl_linux.py b/cuda_bindings/cuda/bindings/_path_finder/load_dl_linux.py index 27b7f39fb..3646cf78f 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/load_dl_linux.py +++ b/cuda_bindings/cuda/bindings/_path_finder/load_dl_linux.py @@ -6,7 +6,7 @@ import os from typing import Optional -from .load_dl_common import LoadedDL +from cuda.bindings._path_finder.load_dl_common import LoadedDL CDLL_MODE = os.RTLD_NOW | os.RTLD_GLOBAL diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py b/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py index 59194f97d..acc3cf459 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py +++ b/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py @@ -8,7 +8,7 @@ import pywintypes import win32api -from .load_dl_common import LoadedDL +from cuda.bindings._path_finder.load_dl_common import LoadedDL # Mirrors WinBase.h (unfortunately not defined already elsewhere) WINBASE_LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR = 0x00000100 diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py b/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py index c6353cb74..6b8c6c16e 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py +++ b/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py @@ -4,8 +4,8 @@ import functools import sys -from .find_nvidia_dynamic_library import _find_nvidia_dynamic_library -from .load_dl_common import LoadedDL, load_dependencies +from cuda.bindings._path_finder.find_nvidia_dynamic_library import _find_nvidia_dynamic_library +from cuda.bindings._path_finder.load_dl_common import LoadedDL, load_dependencies if sys.platform == "win32": from .load_dl_windows import check_if_already_loaded_from_elsewhere, load_with_abs_path, load_with_system_search From 703988c1c8be897b3cb40f0b4d7d55c1df2f41da Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Wed, 30 Apr 2025 14:44:28 -0700 Subject: [PATCH 45/52] =?UTF-8?q?handle:=20int=20=E2=86=92=20ctypes.CDLL?= =?UTF-8?q?=20fix?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- cuda_bindings/cuda/bindings/_path_finder/load_dl_linux.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_dl_linux.py b/cuda_bindings/cuda/bindings/_path_finder/load_dl_linux.py index 3646cf78f..427e337de 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/load_dl_linux.py +++ b/cuda_bindings/cuda/bindings/_path_finder/load_dl_linux.py @@ -27,7 +27,7 @@ class Dl_info(ctypes.Structure): ] -def abs_path_for_dynamic_library(libname: str, handle: int) -> Optional[str]: +def abs_path_for_dynamic_library(libname: str, handle: ctypes.CDLL) -> Optional[str]: """Get the absolute path of a loaded dynamic library on Linux. Args: From 28349a73d69a0653e8f5163a33ee77e35c08b73a Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Wed, 30 Apr 2025 15:07:57 -0700 Subject: [PATCH 46/52] Make load_dl_windows.py abs_path_for_dynamic_library() implementation maximally robust. --- .../bindings/_path_finder/load_dl_windows.py | 33 ++++++++++++++++--- 1 file changed, 28 insertions(+), 5 deletions(-) diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py b/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py index acc3cf459..610e0d26d 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py +++ b/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py @@ -46,12 +46,35 @@ def abs_path_for_dynamic_library(handle: int) -> str: Raises: OSError: If GetModuleFileNameW fails + RuntimeError: If the required path length is unreasonably long """ - buf = ctypes.create_unicode_buffer(260) - n_chars = ctypes.windll.kernel32.GetModuleFileNameW(ctypes.wintypes.HMODULE(handle), buf, len(buf)) - if n_chars == 0: - raise OSError("GetModuleFileNameW failed") - return buf.value + MAX_ITERATIONS = 10 # Allows for extremely long paths (up to ~266,000 chars) + buf_size = 260 # Start with traditional MAX_PATH + + for _ in range(MAX_ITERATIONS): + buf = ctypes.create_unicode_buffer(buf_size) + n_chars = ctypes.windll.kernel32.GetModuleFileNameW(ctypes.wintypes.HMODULE(handle), buf, buf_size) + + if n_chars == 0: + raise OSError( + "GetModuleFileNameW failed. Long paths may require enabling the " + "Windows 10+ long path registry setting. See: " + "https://docs.python.org/3/using/windows.html#removing-the-max-path-limitation" + ) + if n_chars < buf_size - 1: + return buf.value + + buf_size *= 2 # Double the buffer size and try again + + raise RuntimeError( + f"Failed to retrieve the full path after {MAX_ITERATIONS} attempts " + f"(final buffer size: {buf_size} characters). " + "This may indicate:\n" + " 1. An extremely long path requiring Windows long path support, or\n" + " 2. An invalid or corrupt library handle, or\n" + " 3. An unexpected system error.\n" + "See: https://docs.python.org/3/using/windows.html#removing-the-max-path-limitation" + ) def check_if_already_loaded_from_elsewhere(libname: str) -> Optional[LoadedDL]: From c55104c434fdc97e1d8b349284db148e705524e7 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Wed, 30 Apr 2025 15:13:15 -0700 Subject: [PATCH 47/52] =?UTF-8?q?Change=20argument=20name=20=E2=86=92=20li?= =?UTF-8?q?bname=20for=20self-consistency?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py b/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py index 610e0d26d..0f88f26b2 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py +++ b/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py @@ -103,11 +103,11 @@ def check_if_already_loaded_from_elsewhere(libname: str) -> Optional[LoadedDL]: return None -def load_with_system_search(name: str, _unused: str) -> Optional[LoadedDL]: +def load_with_system_search(libname: str, _unused: str) -> Optional[LoadedDL]: """Try to load a DLL using system search paths. Args: - name: The name of the library to load + libname: The name of the library to load _unused: Unused parameter (kept for interface consistency) Returns: @@ -115,7 +115,7 @@ def load_with_system_search(name: str, _unused: str) -> Optional[LoadedDL]: """ from .supported_libs import SUPPORTED_WINDOWS_DLLS - dll_names = SUPPORTED_WINDOWS_DLLS.get(name) + dll_names = SUPPORTED_WINDOWS_DLLS.get(libname) if dll_names is None: return None From b32ed1339bfca8bee3cc6adedf255c888119b4ca Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Wed, 30 Apr 2025 15:16:53 -0700 Subject: [PATCH 48/52] Systematically replace previously overlooked relative imports with absolute imports. --- .../cuda/bindings/_path_finder/load_dl_linux.py | 4 ++-- .../cuda/bindings/_path_finder/load_dl_windows.py | 6 +++--- .../_path_finder/load_nvidia_dynamic_library.py | 12 ++++++++++-- 3 files changed, 15 insertions(+), 7 deletions(-) diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_dl_linux.py b/cuda_bindings/cuda/bindings/_path_finder/load_dl_linux.py index 427e337de..b9f3839e1 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/load_dl_linux.py +++ b/cuda_bindings/cuda/bindings/_path_finder/load_dl_linux.py @@ -40,7 +40,7 @@ def abs_path_for_dynamic_library(libname: str, handle: ctypes.CDLL) -> Optional[ Raises: OSError: If dladdr fails to get information about the symbol """ - from .supported_libs import EXPECTED_LIB_SYMBOLS + from cuda.bindings._path_finder.supported_libs import EXPECTED_LIB_SYMBOLS for symbol_name in EXPECTED_LIB_SYMBOLS[libname]: symbol = getattr(handle, symbol_name, None) @@ -70,7 +70,7 @@ def check_if_already_loaded_from_elsewhere(libname: str) -> Optional[LoadedDL]: >>> if loaded is not None: ... print(f"Library already loaded from {loaded.abs_path}") """ - from .supported_libs import SUPPORTED_LINUX_SONAMES + from cuda.bindings._path_finder.supported_libs import SUPPORTED_LINUX_SONAMES for soname in SUPPORTED_LINUX_SONAMES.get(libname, ()): try: diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py b/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py index 0f88f26b2..b375a21f7 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py +++ b/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py @@ -91,7 +91,7 @@ def check_if_already_loaded_from_elsewhere(libname: str) -> Optional[LoadedDL]: >>> if loaded is not None: ... print(f"Library already loaded from {loaded.abs_path}") """ - from .supported_libs import SUPPORTED_WINDOWS_DLLS + from cuda.bindings._path_finder.supported_libs import SUPPORTED_WINDOWS_DLLS for dll_name in SUPPORTED_WINDOWS_DLLS.get(libname, ()): try: @@ -113,7 +113,7 @@ def load_with_system_search(libname: str, _unused: str) -> Optional[LoadedDL]: Returns: A LoadedDL object if successful, None if the library cannot be loaded """ - from .supported_libs import SUPPORTED_WINDOWS_DLLS + from cuda.bindings._path_finder.supported_libs import SUPPORTED_WINDOWS_DLLS dll_names = SUPPORTED_WINDOWS_DLLS.get(libname) if dll_names is None: @@ -140,7 +140,7 @@ def load_with_abs_path(libname: str, found_path: str) -> LoadedDL: Raises: RuntimeError: If the DLL cannot be loaded """ - from .supported_libs import LIBNAMES_REQUIRING_OS_ADD_DLL_DIRECTORY + from cuda.bindings._path_finder.supported_libs import LIBNAMES_REQUIRING_OS_ADD_DLL_DIRECTORY if libname in LIBNAMES_REQUIRING_OS_ADD_DLL_DIRECTORY: add_dll_directory(found_path) diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py b/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py index 6b8c6c16e..015c4cdf8 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py +++ b/cuda_bindings/cuda/bindings/_path_finder/load_nvidia_dynamic_library.py @@ -8,9 +8,17 @@ from cuda.bindings._path_finder.load_dl_common import LoadedDL, load_dependencies if sys.platform == "win32": - from .load_dl_windows import check_if_already_loaded_from_elsewhere, load_with_abs_path, load_with_system_search + from cuda.bindings._path_finder.load_dl_windows import ( + check_if_already_loaded_from_elsewhere, + load_with_abs_path, + load_with_system_search, + ) else: - from .load_dl_linux import check_if_already_loaded_from_elsewhere, load_with_abs_path, load_with_system_search + from cuda.bindings._path_finder.load_dl_linux import ( + check_if_already_loaded_from_elsewhere, + load_with_abs_path, + load_with_system_search, + ) def _load_nvidia_dynamic_library_no_cache(libname: str) -> LoadedDL: From 92e7b42a3e43328b9da3d6944c8ce15f6645aa3f Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Wed, 30 Apr 2025 15:20:38 -0700 Subject: [PATCH 49/52] Simplify code (also for self-consistency) --- cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py b/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py index b375a21f7..1f0c9c7e2 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py +++ b/cuda_bindings/cuda/bindings/_path_finder/load_dl_windows.py @@ -115,11 +115,7 @@ def load_with_system_search(libname: str, _unused: str) -> Optional[LoadedDL]: """ from cuda.bindings._path_finder.supported_libs import SUPPORTED_WINDOWS_DLLS - dll_names = SUPPORTED_WINDOWS_DLLS.get(libname) - if dll_names is None: - return None - - for dll_name in dll_names: + for dll_name in SUPPORTED_WINDOWS_DLLS.get(libname, ()): handle = ctypes.windll.kernel32.LoadLibraryW(ctypes.c_wchar_p(dll_name)) if handle: return LoadedDL(handle, abs_path_for_dynamic_library(handle), False) From 5a835d7576b95104723d60bb0e90503883cc4686 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Thu, 1 May 2025 09:41:10 -0700 Subject: [PATCH 50/52] Expand the 3. **System Installations** section with information produced by perplexity --- cuda_bindings/cuda/bindings/_path_finder/README.md | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/cuda_bindings/cuda/bindings/_path_finder/README.md b/cuda_bindings/cuda/bindings/_path_finder/README.md index b4f73c64a..a1fac27da 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/README.md +++ b/cuda_bindings/cuda/bindings/_path_finder/README.md @@ -31,9 +31,16 @@ strategy for locating NVIDIA shared libraries: - Leverages Conda-specific paths through our fork of `get_cuda_paths()` from Numba. 3. **System Installations** - - Checks traditional system locations via the same `get_cuda_paths()` implementation. - — Note that `get_cuda_paths()` references `CUDA_HOME` and `CUDA_PATH`. The existing - mechanism are used as-is (see Implementation Philosophy below). + - Checks traditional system locations through these paths: + - Linux: `/usr/local/cuda/lib64` + - Windows: `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vX.Y\bin` + (where X.Y is the CTK version) + - **Notably does NOT search**: + - Versioned CUDA directories like `/usr/local/cuda-12.3` + - Distribution-specific packages (RPM/DEB) + EXCEPT Debian's `nvidia-cuda-toolkit` + - Relies on `CUDA_HOME`/`CUDA_PATH` environment variables if set, but falls + back to hardcoded paths when unset 4. **OS Default Mechanisms** - Falls back to native loader: From b910a6b10e0515e10591850ffb82bb0726b61b5e Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Thu, 1 May 2025 10:15:06 -0700 Subject: [PATCH 51/52] Pull out `**Environment variables**` into an added section, after manual inspection of cuda_paths.py. Minor additional edits. --- .../cuda/bindings/_path_finder/README.md | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/cuda_bindings/cuda/bindings/_path_finder/README.md b/cuda_bindings/cuda/bindings/_path_finder/README.md index a1fac27da..94b80499f 100644 --- a/cuda_bindings/cuda/bindings/_path_finder/README.md +++ b/cuda_bindings/cuda/bindings/_path_finder/README.md @@ -28,9 +28,13 @@ strategy for locating NVIDIA shared libraries: - Scans `sys.path` to find libraries installed via NVIDIA Python wheels. 2. **Conda Environments** - - Leverages Conda-specific paths through our fork of `get_cuda_paths()` from Numba. + - Leverages Conda-specific paths through our fork of `get_cuda_paths()` + from numba-cuda. -3. **System Installations** +3. **Environment variables** + - Relies on `CUDA_HOME`/`CUDA_PATH` environment variables if set. + +4. **System Installations** - Checks traditional system locations through these paths: - Linux: `/usr/local/cuda/lib64` - Windows: `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vX.Y\bin` @@ -39,10 +43,8 @@ strategy for locating NVIDIA shared libraries: - Versioned CUDA directories like `/usr/local/cuda-12.3` - Distribution-specific packages (RPM/DEB) EXCEPT Debian's `nvidia-cuda-toolkit` - - Relies on `CUDA_HOME`/`CUDA_PATH` environment variables if set, but falls - back to hardcoded paths when unset -4. **OS Default Mechanisms** +5. **OS Default Mechanisms** - Falls back to native loader: - `dlopen()` on Linux - `LoadLibraryW()` on Windows @@ -54,8 +56,8 @@ mechanism that ensures all libraries are found in the same way. The current implementation balances stability and evolution: -- **Baseline Foundation:** Uses a fork of Numba's `cuda_paths.py` that has been - battle-tested in production environments +- **Baseline Foundation:** Uses a fork of numba-cuda's `cuda_paths.py` that has been + battle-tested in production environments. - **Validation Infrastructure:** Comprehensive CI testing matrix being developed to cover: - Various Linux/Windows environments From fc22b1d296016212c587cbd6e2f82f4b9f2b4cd5 Mon Sep 17 00:00:00 2001 From: "Ralf W. Grosse-Kunstleve" Date: Thu, 1 May 2025 10:51:23 -0700 Subject: [PATCH 52/52] Revert "Force compute-sanitizer off unconditionally" This reverts commit aeaf4f02278b62befb0e380e9f6f97a50b848fb3. --- .github/workflows/test-wheel-linux.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/test-wheel-linux.yml b/.github/workflows/test-wheel-linux.yml index 705f09be1..83dad5cec 100644 --- a/.github/workflows/test-wheel-linux.yml +++ b/.github/workflows/test-wheel-linux.yml @@ -189,7 +189,7 @@ jobs: # We don't test compute-sanitizer on CTK<12 because backporting fixes is too much effort # We only test compute-sanitizer on python 3.12 arbitrarily; we don't need to use sanitizer on the entire matrix # Only local ctk installs have compute-sanitizer; there is not wheel for it - if [[ "${{ inputs.python-version }}" == "9.99" && "${{ inputs.cuda-version }}" != "11.8.0" && "${{ inputs.local-ctk }}" == 1 ]]; then + if [[ "${{ inputs.python-version }}" == "3.12" && "${{ inputs.cuda-version }}" != "11.8.0" && "${{ inputs.local-ctk }}" == 1 ]]; then COMPUTE_SANITIZER="${CUDA_HOME}/bin/compute-sanitizer" COMPUTE_SANITIZER_VERSION=$(${COMPUTE_SANITIZER} --version | grep -Eo "[0-9]{4}\.[0-9]\.[0-9]" | sed -e 's/\.//g') SANITIZER_CMD="${COMPUTE_SANITIZER} --target-processes=all --launch-timeout=0 --tool=memcheck --error-exitcode=1"