-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[test/doc] make NixlConnector example more clear #24249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request improves the clarity of the NixlConnector integration test script. The changes correctly distinguish between kv_producer and kv_consumer roles, which is more explicit than the previous kv_both. Additionally, the inclusion of the UCX_NET_DEVICES=all environment variable serves as a useful hint for users transitioning from NCCL to UCX for communication configuration. The formatting of the command construction has also been improved for better readability. The changes are sound and achieve the stated goal of making the example clearer.
chaunceyjiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks~
#21800 (comment)
It seems there’s still a lack of tutorials on NixlConnector’s xPyD. Could you help add one?
NickLucche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @panpan0000 !
I agree with you the lack of documentation for the NixlConnector can be frustrating, but I don't think editing this script is the best way towards clarity.
I would be happier to see a more specific docs page on the topic, with a list of references eg nixl unit tests, llm-d guides https://github.com/llm-d/llm-d/tree/dev/guides/pd-disaggregation and basic nixl setup.
I can help with some Justfiles to get started too.
(1) we can tell between consumer and producer role
(2) tell user NCCL_* environment variables are no longer applicable to NixlConnector, but UCX replaces NCCL, so UCX_* variable should be used instead.
1 - the fact you can specify kv_both here is a "feature not a bug" as the connector makes no assumption about its role providing symmetric functionality.
2 - This isn't generally true as nixl supports backends other than ucx, although ucx is indeed the main transport library.
d4a3eeb to
85a463f
Compare
|
Thanks @NickLucche , the doc already added per @chaunceyjiang's suggestion. can you please help to review again, thanks |
85a463f to
76b39e0
Compare
c89986a to
2af5028
Compare
|
Thank you @hmellor , all fixed |
NickLucche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the doc page!!
Left some comments
|
Thank you very much for the work of all the experts. I have tried to understand your work, but I find it somewhat difficult. SGLang and Dynamo have already provided PD-disaggregated inference tutorials that are very user-friendly for readers. Perhaps these can bring better inspiration to everyone. https://docs.sglang.ai/advanced_features/pd_disaggregation.html |
|
Great point @Alan-D-Chen ! |
165858c to
4c44ac0
Compare
|
Hi, @NickLucche @hmellor Looking forward to your double check, thank you for your time :-) |
NickLucche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly good, I just think the installation instructions should be more straightforward given users will just want to try it out and get into action quickly
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Peter Pan <peter.pan@daocloud.io>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com>
4c44ac0 to
cd77ea7
Compare
|
Thank you for your time again, @NickLucche , your comments are all fixed :-) |
NickLucche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks for contributing @panpan0000 !
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Signed-off-by: Peter Pan <peter.pan@daocloud.io> Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Signed-off-by: Peter Pan <peter.pan@daocloud.io> Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com> Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Signed-off-by: Peter Pan <peter.pan@daocloud.io> Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Signed-off-by: Peter Pan <peter.pan@daocloud.io> Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com> Signed-off-by: gaojc <1055866782@qq.com>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Signed-off-by: Peter Pan <peter.pan@daocloud.io> Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Signed-off-by: Peter Pan <peter.pan@daocloud.io> Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Signed-off-by: Peter Pan <peter.pan@daocloud.io> Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Signed-off-by: Peter Pan <peter.pan@daocloud.io> Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Purpose
NixlConnector are lack of tutorial documents, and the only reference,
per https://github.com/vllm-project/vllm/blob/main/docs/features/disagg_prefill.md?plain=1#L26
is the test code.
So it's better to make this code more clear.
NCCL_*environment variables are no longer applicable to NixlConnector, but UCX replaces NCCL, soUCX_*variable should be used instead.example,
UCX_TLSorUCX_NET_DEVICESare the way to configurate underlaying communication device or method,NCCL_IB_HCANCCL_SOCKET_IFNAMEare not applicable.So in my PR,
UCX_NET_DEVICES=allis just a "Hint" to people to be aware of that.If you want me to add 2 more additional Tips:
VLLM_NIXL_SIDE_CHANNEL_HOSTvariables , which is helpful when P and D are in diff machinesVLLM_NIXL_SIDE_CHANNEL_*for Decoder , since it's just needed for Prefiller.I'm glad to follow up