Skip to content

Conversation

@panpan0000
Copy link
Contributor

@panpan0000 panpan0000 commented Sep 4, 2025

Purpose

NixlConnector are lack of tutorial documents, and the only reference,
per https://github.com/vllm-project/vllm/blob/main/docs/features/disagg_prefill.md?plain=1#L26
is the test code.

So it's better to make this code more clear.


  • (1) we can tell between consumer and producer role
  • (2) tell user NCCL_* environment variables are no longer applicable to NixlConnector, but UCX replaces NCCL, so UCX_* variable should be used instead.

example, UCX_TLS or UCX_NET_DEVICES are the way to configurate underlaying communication device or method, NCCL_IB_HCA NCCL_SOCKET_IFNAME are not applicable.

So in my PR, UCX_NET_DEVICES=all is just a "Hint" to people to be aware of that.


If you want me to add 2 more additional Tips:

  • adding VLLM_NIXL_SIDE_CHANNEL_HOST variables , which is helpful when P and D are in diff machines
  • remove VLLM_NIXL_SIDE_CHANNEL_* for Decoder , since it's just needed for Prefiller.

I'm glad to follow up

@mergify mergify bot added the v1 label Sep 4, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the clarity of the NixlConnector integration test script. The changes correctly distinguish between kv_producer and kv_consumer roles, which is more explicit than the previous kv_both. Additionally, the inclusion of the UCX_NET_DEVICES=all environment variable serves as a useful hint for users transitioning from NCCL to UCX for communication configuration. The formatting of the command construction has also been improved for better readability. The changes are sound and achieve the stated goal of making the example clearer.

@panpan0000
Copy link
Contributor Author

@chaunceyjiang

Copy link
Collaborator

@chaunceyjiang chaunceyjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks~

#21800 (comment)
It seems there’s still a lack of tutorials on NixlConnector’s xPyD. Could you help add one?

Copy link
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @panpan0000 !

I agree with you the lack of documentation for the NixlConnector can be frustrating, but I don't think editing this script is the best way towards clarity.

I would be happier to see a more specific docs page on the topic, with a list of references eg nixl unit tests, llm-d guides https://github.com/llm-d/llm-d/tree/dev/guides/pd-disaggregation and basic nixl setup.
I can help with some Justfiles to get started too.

(1) we can tell between consumer and producer role
(2) tell user NCCL_* environment variables are no longer applicable to NixlConnector, but UCX replaces NCCL, so UCX_* variable should be used instead.

1 - the fact you can specify kv_both here is a "feature not a bug" as the connector makes no assumption about its role providing symmetric functionality.
2 - This isn't generally true as nixl supports backends other than ucx, although ucx is indeed the main transport library.

@panpan0000 panpan0000 requested a review from hmellor as a code owner September 17, 2025 09:15
@mergify mergify bot added the documentation Improvements or additions to documentation label Sep 17, 2025
@panpan0000
Copy link
Contributor Author

Thanks @NickLucche , the doc already added per @chaunceyjiang's suggestion. can you please help to review again, thanks

@panpan0000 panpan0000 changed the title [test] make NixlConnector example more clear [test/doc] make NixlConnector example more clear Sep 17, 2025
@panpan0000 panpan0000 force-pushed the nixl-test branch 3 times, most recently from c89986a to 2af5028 Compare September 17, 2025 11:42
@panpan0000
Copy link
Contributor Author

Thank you @hmellor , all fixed

Copy link
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the doc page!!
Left some comments

@Alan-D-Chen
Copy link

Thank you very much for the work of all the experts. I have tried to understand your work, but I find it somewhat difficult. SGLang and Dynamo have already provided PD-disaggregated inference tutorials that are very user-friendly for readers. Perhaps these can bring better inspiration to everyone.

https://docs.sglang.ai/advanced_features/pd_disaggregation.html
https://github.com/ai-dynamo/dynamo/blob/v0.3.2/examples/sglang/multinode-examples.md

@NickLucche
Copy link
Collaborator

Great point @Alan-D-Chen !

@panpan0000
Copy link
Contributor Author

Hi, @NickLucche @hmellor Looking forward to your double check, thank you for your time :-)

Copy link
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly good, I just think the installation instructions should be more straightforward given users will just want to try it out and get into action quickly

panpan0000 and others added 8 commits September 23, 2025 10:36
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Peter Pan <peter.pan@daocloud.io>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com>
@panpan0000
Copy link
Contributor Author

Thank you for your time again, @NickLucche , your comments are all fixed :-)

Copy link
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for contributing @panpan0000 !

@github-project-automation github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Sep 23, 2025
@NickLucche NickLucche enabled auto-merge (squash) September 23, 2025 13:36
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 23, 2025
@NickLucche NickLucche merged commit da5e7e4 into vllm-project:main Sep 23, 2025
29 checks passed
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <peter.pan@daocloud.io>
Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <peter.pan@daocloud.io>
Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Signed-off-by: charlifu <charlifu@amd.com>
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <peter.pan@daocloud.io>
Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
gjc0824 pushed a commit to gjc0824/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <peter.pan@daocloud.io>
Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Signed-off-by: gaojc <1055866782@qq.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <peter.pan@daocloud.io>
Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <peter.pan@daocloud.io>
Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <peter.pan@daocloud.io>
Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <peter.pan@daocloud.io>
Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build deepseek Related to DeepSeek models documentation Improvements or additions to documentation frontend gpt-oss Related to GPT-OSS models kv-connector multi-modality Related to multi-modality (#4194) performance Performance-related issues qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants