-
Notifications
You must be signed in to change notification settings - Fork 181
Description
Is this a duplicate?
- I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct
Type of Bug
Silent Failure
Component
cuda.bindings
Describe the bug
@leofang wrote:
I noticed that on the system that I’m on, which has a system CTK 12.3 and I installed CTK 12.9 from conda, the pathfinder from either cuda.bindings 12.9.0 or cuda.pathfinder 1.0.0 would pick up nvJitLink 12.3 (the system one) instead of 12.9 (the conda one), which is not following the behavior that we documented.
I suspect that the logic in _load_nvidia_dynamic_library_no_cache
might be wrong:
# Find the library path
found = _find_nvidia_dynamic_library(libname)
if found.abs_path is None:
loaded = load_with_system_search(libname, found.lib_searched_for)
because in _find_nvidia_dynamic_library
we always do this on Linux:
self.lib_searched_for = f"lib{libname}.so"
meaning we don’t search with the full soname (libnvJitLink.so.12
), but the symlink name (libnvJitLink.so
), which conda does not provide if we only install the libnvjitlink package and not the libnvjitlink-dev package.
Therefore, the load_with_system_search
function behaves wrong because we fed it a wrong soname.
How to Reproduce
I think a simple reproducer would be:
- launch a vanilla Ubuntu container
- Install miniforge and then create a new conda env with only cuda-pathfinder (from pip) and libnvjitlink (from conda-forge) installed.
- Run the pathfinder
from cuda import pathfinder
pathfinder.load_nvidia_dynamic_lib("nvJitLink")
Expected behavior
The conda .so
should be found.
Operating System
No response
nvidia-smi output
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status