Skip to content

Conversation

@xuechendi
Copy link
Contributor

@xuechendi xuechendi commented Sep 29, 2025

Purpose

With num_threads added from PR#25844, nixl requirememt will be >= 0.5.1
However, if do pip install nixl>=0.5.1, libcuda.so will be hard-dependency.

For CPU offloading path in nixl, libcuda.so is not must to have

This PR is to provide a version check to make nixl_connector compatible with early nixl version

Alternative solution:

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to add backward compatibility for older versions of the nixl package by conditionally adding the num_threads argument. However, the implementation has a critical flaw in how it compares version strings, which can lead to incorrect behavior. My review provides a fix for this version comparison logic.

Comment on lines +493 to +495
ucx_args = {
'num_threads': 8,
} if get_nixl_version() >= '0.5.1' else {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Comparing versions as strings is error-prone because it uses lexicographical ordering. For example, '0.10.0' would be considered less than '0.5.1'. To ensure correct version comparison, you should parse the version string into a comparable format, like a tuple of integers. This approach is more robust than string comparison for simple version numbers. For full compliance with PEP 440 versioning, consider using the packaging library if nixl adopts more complex versioning schemes in the future.

Suggested change
ucx_args = {
'num_threads': 8,
} if get_nixl_version() >= '0.5.1' else {}
ucx_args = {
'num_threads': 8,
} if tuple(map(int, get_nixl_version().split('.'))) >= (0, 5, 1) else {}

@robertgshaw2-redhat
Copy link
Collaborator

offline discussion:

Robert Shaw
5 minutes ago
just curious, why do you need prior nixl version?

7:19
it seems to me that nixl + vllm should be in version sync
7:20
there is critical performance reasons for this change

Chendi Xue
4 minutes ago
Because nixl>=0.5.1 uses built-in UCX compiled with libcuda.so
New
7:21
But any nixl<=0.5.0 does not have the hard-dependency on libcuda.so; And it makes sense for any non-cuda platform which uses host_buffer

Robert Shaw
3 minutes ago
is this a long term thing? I think that having to maintain compatibility with old versions of new libraries is not ideal

Chendi Xue
2 minutes ago
I will also submit an issue to NIXL repo to request for non-cuda-UCX wheel

Robert Shaw
2 minutes ago
e.g. we are going to start using new nixl features for telemetry
7:22
So I dont think that we can maintain support for multiple versions of nixl.

Chendi Xue
1 minute ago
Once that gets resolved, we can remove the version check; But currently, I think there will be a instance broken for all non-cuda platform

Robert Shaw
1 minute ago
this case is small, but I think that we need to use the newer versions for other features that are harder to isolate
7:23
Well is that really true? Its true that this will break if you install the nixl wheel provided by nvidia
7:23
But you can build a nixl wheel + ucx yourself (edited)

Chendi Xue
Just now
I see, will try to have non-cuda support from nixl wheel as well. It will be great if we can use this PR to workaround firstly

@xuechendi
Copy link
Contributor Author

Resolve this issue by providing a build_from_source script instead, close

@xuechendi xuechendi closed this Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants