Skip to content

[BUG][pathfinder]: nvJitLink loaded from system library instead of conda #763

@rwgk

Description

@rwgk

Is this a duplicate?

Type of Bug

Silent Failure

Component

cuda.bindings

Describe the bug

@leofang wrote:

I noticed that on the system that I’m on, which has a system CTK 12.3 and I installed CTK 12.9 from conda, the pathfinder from either cuda.bindings 12.9.0 or cuda.pathfinder 1.0.0 would pick up nvJitLink 12.3 (the system one) instead of 12.9 (the conda one), which is not following the behavior that we documented.
I suspect that the logic in _load_nvidia_dynamic_library_no_cache might be wrong:

    # Find the library path
    found = _find_nvidia_dynamic_library(libname)
    if found.abs_path is None:
        loaded = load_with_system_search(libname, found.lib_searched_for)

because in _find_nvidia_dynamic_library we always do this on Linux:

    self.lib_searched_for = f"lib{libname}.so"

meaning we don’t search with the full soname (libnvJitLink.so.12), but the symlink name (libnvJitLink.so), which conda does not provide if we only install the libnvjitlink package and not the libnvjitlink-dev package.
Therefore, the load_with_system_search function behaves wrong because we fed it a wrong soname.

How to Reproduce

I think a simple reproducer would be:

  • launch a vanilla Ubuntu container
  • Install miniforge and then create a new conda env with only cuda-pathfinder (from pip) and libnvjitlink (from conda-forge) installed.
  • Run the pathfinder
from cuda import pathfinder
pathfinder.load_nvidia_dynamic_lib("nvJitLink")

Expected behavior

The conda .so should be found.

Operating System

No response

nvidia-smi output

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcuda.pathfinderEverything related to the cuda.pathfinder moduletriageNeeds the team's attention

Type

No type

Projects

Status

Done

Relationships

None yet

Development

No branches or pull requests

Issue actions