-
Notifications
You must be signed in to change notification settings - Fork 183
Open
Description
=== Issue ===
After nixl>=0.5.1, libcuda.so is compiled with UCX: https://github.com/ai-dynamo/nixl/blob/main/contrib/Dockerfile#L128
And then the UCX folder will be used to build wheel for nixl release.
This gets any nixl>=0.5.1 version failed on non-cuda platform.
Tested with both 0.5.1 and 0.6.0, same error is seen.
E0930 00:41:21.351561 199 nixl_plugin_manager.cpp:122] Failed to load plugin from /usr/local/lib/python3.12/dist-packages/.nixl.mesonpy.libs/plugins/libplugin_GDS_MT.so: libcudart.so.12: cannot open shared object file: No such file or directory
E0930 00:41:21.351608 199 nixl_plugin_manager.cpp:288] Failed to load plugin 'GDS_MT' from any directory
E0930 00:41:21.351958 199 nixl_plugin_manager.cpp:122] Failed to load plugin from /usr/local/lib/python3.12/dist-packages/.nixl.mesonpy.libs/plugins/libplugin_GDS.so: libcudart.so.12: cannot open shared object file: No such file or directory
E0930 00:41:21.351969 199 nixl_plugin_manager.cpp:288] Failed to load plugin 'GDS' from any directory
E0930 00:41:21.352478 199 nixl_plugin_manager.cpp:122] Failed to load plugin from /usr/local/lib/python3.12/dist-packages/.nixl.mesonpy.libs/plugins/libplugin_GPUNETIO.so: libcuda.so.1: cannot open shared object file: No such file or directory
E0930 00:41:21.352488 199 nixl_plugin_manager.cpp:288] Failed to load plugin 'GPUNETIO' from any directory
E0930 00:41:21.358970 199 nixl_plugin_manager.cpp:122] Failed to load plugin from /usr/local/lib/python3.12/dist-packages/.nixl.mesonpy.libs/plugins/libplugin_UCX.so: libcuda.so.1: cannot open shared object file: No such file or directory
E0930 00:41:21.358985 199 nixl_plugin_manager.cpp:288] Failed to load plugin 'UCX' from any directory
E0930 00:41:21.359532 199 nixl_plugin_manager.cpp:122] Failed to load plugin from /usr/local/lib/python3.12/dist-packages/.nixl.mesonpy.libs/plugins/libplugin_UCX_MO.so: libcudart.so.12: cannot open shared object file: No such file or directory
E0930 00:41:21.359542 199 nixl_plugin_manager.cpp:288] Failed to load plugin 'UCX_MO' from any directory
2025-09-30 00:41:21 NIXL WARNING _api.py:216 Skipping backend registration UCX due to the missing plugin.
...
[ERROR][Sender][2025-09-30 00:41:17,239] Sender process failed
Traceback (most recent call last):
File "/workspace/vllm-gaudi/examples/nixl_api_test.py", line 291, in sender_process
agent.register_memory(reg_descs, backends=[args.nixl_backend])
File "/usr/local/lib/python3.12/dist-packages/nixl/_api.py", line 376, in register_memory
handle_list.append(self.backends[backend_string])
~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
KeyError: 'UCX'
=== Why important ===
vLLM non-cuda platform are using CPU tensor for nixl_memory_registration.
=== proposal ===
- Is that possible to release two version each nixl release, nixl and nixl[cpu] ?
tsg-
Metadata
Metadata
Assignees
Labels
No labels