Skip to content

[ISSUE] [>=0.5.1] Failed on non-cuda platform, requesting for non-cuda UCX built-in wheel support #838

@xuechendi

Description

@xuechendi

=== Issue ===
After nixl>=0.5.1, libcuda.so is compiled with UCX: https://github.com/ai-dynamo/nixl/blob/main/contrib/Dockerfile#L128
And then the UCX folder will be used to build wheel for nixl release.

This gets any nixl>=0.5.1 version failed on non-cuda platform.

Tested with both 0.5.1 and 0.6.0, same error is seen.

E0930 00:41:21.351561     199 nixl_plugin_manager.cpp:122] Failed to load plugin from /usr/local/lib/python3.12/dist-packages/.nixl.mesonpy.libs/plugins/libplugin_GDS_MT.so: libcudart.so.12: cannot open shared object file: No such file or directory
E0930 00:41:21.351608     199 nixl_plugin_manager.cpp:288] Failed to load plugin 'GDS_MT' from any directory
E0930 00:41:21.351958     199 nixl_plugin_manager.cpp:122] Failed to load plugin from /usr/local/lib/python3.12/dist-packages/.nixl.mesonpy.libs/plugins/libplugin_GDS.so: libcudart.so.12: cannot open shared object file: No such file or directory
E0930 00:41:21.351969     199 nixl_plugin_manager.cpp:288] Failed to load plugin 'GDS' from any directory
E0930 00:41:21.352478     199 nixl_plugin_manager.cpp:122] Failed to load plugin from /usr/local/lib/python3.12/dist-packages/.nixl.mesonpy.libs/plugins/libplugin_GPUNETIO.so: libcuda.so.1: cannot open shared object file: No such file or directory
E0930 00:41:21.352488     199 nixl_plugin_manager.cpp:288] Failed to load plugin 'GPUNETIO' from any directory
E0930 00:41:21.358970     199 nixl_plugin_manager.cpp:122] Failed to load plugin from /usr/local/lib/python3.12/dist-packages/.nixl.mesonpy.libs/plugins/libplugin_UCX.so: libcuda.so.1: cannot open shared object file: No such file or directory
E0930 00:41:21.358985     199 nixl_plugin_manager.cpp:288] Failed to load plugin 'UCX' from any directory
E0930 00:41:21.359532     199 nixl_plugin_manager.cpp:122] Failed to load plugin from /usr/local/lib/python3.12/dist-packages/.nixl.mesonpy.libs/plugins/libplugin_UCX_MO.so: libcudart.so.12: cannot open shared object file: No such file or directory
E0930 00:41:21.359542     199 nixl_plugin_manager.cpp:288] Failed to load plugin 'UCX_MO' from any directory
2025-09-30 00:41:21 NIXL WARNING _api.py:216 Skipping backend registration UCX due to the missing plugin.

...

[ERROR][Sender][2025-09-30 00:41:17,239] Sender process failed
Traceback (most recent call last):
  File "/workspace/vllm-gaudi/examples/nixl_api_test.py", line 291, in sender_process
    agent.register_memory(reg_descs, backends=[args.nixl_backend])
  File "/usr/local/lib/python3.12/dist-packages/nixl/_api.py", line 376, in register_memory
    handle_list.append(self.backends[backend_string])
                       ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
KeyError: 'UCX'

=== Why important ===

vLLM non-cuda platform are using CPU tensor for nixl_memory_registration.

vllm-project/vllm#25911

=== proposal ===

  • Is that possible to release two version each nixl release, nixl and nixl[cpu] ?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions