Skip to content

Conversation

@chenxi-yang
Copy link
Contributor

@chenxi-yang chenxi-yang commented Sep 11, 2025

Accidentally closed #24619

Support PD disaggregation for CUDA:CPU based on #22436
Utilizing CPU memory as a buffer and performing point-to-point transmission via NIXL.

Use case: We want to support multi-turn for multi-hosts. When the decode' kv cache is larger than GPU's memory or we need to pass other prefill's kv to the current prefill through decode, we may want to use CPU as cache buffer.

Purpose

Currently, cuda:cpu is not supported yet (#18293 (comment)). This PR adds the support of host buffer on cuda devices.

Test Plan

bash tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh --kv_buffer_device cpu for accuracy test.
bash tests/v1/kv_connector/nixl_integration/run_edge_case_test.sh --kv_buffer_device cpu for edge case test.

Test Result

All tests completed!

@mergify
Copy link

mergify bot commented Sep 11, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @chenxi-yang.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for CUDA to CPU KV cache offloading via the NixlConnector, which is a valuable feature for handling large models. The changes include updates to the NixlConnector to recognize 'cpu' as a valid buffer device for CUDA, and new integration tests to validate this functionality. My review identified a critical bug in the block copying logic within the CUDA platform code, which would cause incorrect behavior or errors. Additionally, I've pointed out some high-severity issues in the new test scripts, such as the use of eval which poses a security risk, and incorrect variable usage in log messages.

Comment on lines 95 to 101
if [ -n "$model_args" ]; then
FULL_CMD="$BASE_CMD $model_args"
else
FULL_CMD="$BASE_CMD"
fi

eval "$FULL_CMD &"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The use of eval to execute the command is dangerous as it can lead to arbitrary code execution if variables in the command string are not properly sanitized. This occurs for both prefill (line 101) and decode (line 133) instance startup. While the risk is lower in a test script with controlled inputs, it's a bad practice. Please consider refactoring the command execution to use arrays, which is safer. This would involve changing get_model_args to output an array of arguments and then executing the command array directly.

Comment on lines 63 to 69
if [ -n "$model_args" ]; then
FULL_CMD="$BASE_CMD $model_args"
else
FULL_CMD="$BASE_CMD"
fi

eval "$FULL_CMD &"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The use of eval to execute the command is dangerous as it can lead to arbitrary code execution if variables in the command string are not properly sanitized. This occurs for both prefill (line 69) and decode (line 87) instance startup. It's a best practice to avoid eval. Please consider refactoring the command execution to use arrays, which is safer. This would involve changing get_model_args to output an array of arguments and then executing the command array directly.

@mergify
Copy link

mergify bot commented Sep 11, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @chenxi-yang.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 11, 2025
Copy link
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will review asap, sorry!

@njhill
Copy link
Member

njhill commented Sep 12, 2025

Thanks @chenxi-yang! It looks good at first glance but would you mind reverting all of the formatting changes? It makes it difficult to see what the real changes are are is unrelated to the purpose of the PR.

Feel free to open a separate PR with formatting improvements!

@chenxi-yang
Copy link
Contributor Author

Thanks @chenxi-yang! It looks good at first glance but would you mind reverting all of the formatting changes? It makes it difficult to see what the real changes are are is unrelated to the purpose of the PR.

Feel free to open a separate PR with formatting improvements!

Cleaned up the formatting edits from auto-save!

Copy link
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks clean, thanks @chenxi-yang !
I just left a few minor comments.
Also, would you mind updating the docstring in KVTransferConfig.kv_buffer_device, we indeed support cpu now.

@NickLucche
Copy link
Collaborator

are you opening a new one @chenxi-yang ?

@chenxi-yang
Copy link
Contributor Author

are you opening a new one @chenxi-yang ?

Sorry! I was swamped in other tasks. I am not opening a new one and fixed the requested changes. Let me know how you think. Thank you!

copy_kv_blocks)
kv_transfer_group = get_kv_transfer_group()
kv_transfer_group.register_kv_caches(kv_caches)
kv_transfer_group.set_host_xfer_buffer_ops(copy_kv_blocks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving "if self.kv_buffer_device != "cpu"" condition here ? I think it will be straight forward to read the codes.
Suggestion:

if self.vllm_config.kv_transfer_config.kv_buffer_device == "cpu":
    kv_transfer_group.set_host_xfer_buffer_ops(copy_kv_blocks)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved in @NickLucche comments below, please ignore

if [[ "$KV_BUFFER_DEVICE" == "cuda" ]]; then
KV_CONFIG='{"kv_connector":"NixlConnector","kv_role":"kv_both"}'
else
KV_CONFIG="{\"kv_connector\":\"NixlConnector\",\"kv_role\":\"kv_both\",\"kv_buffer_device\":\"$KV_BUFFER_DEVICE\"}"
Copy link
Contributor

@xuechendi xuechendi Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added backends support in this PR: #25121
May you also provide options in the run_accuracy_test?

Suggested codes:

VLLM_NIXL_BACKEND=${VLLM_NIXL_BACKEND:-"[\"UCX\"]"}

--kv-transfer-config '{\"kv_connector\":\"NixlConnector\",\"kv_role\":\"kv_both\",\"kv_buffer_device\":\"${KV_BUFFER_DEVICE}\", \"kv_connector_extra_config\":{\"backends\":${VLLM_NIXL_BACKEND}}}'"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep the scope of the PR focused here, we can do that in a separate PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will take the task

if [[ "$KV_BUFFER_DEVICE" == "cuda" ]]; then
KV_CONFIG='{"kv_connector":"NixlConnector","kv_role":"kv_both"}'
else
KV_CONFIG="{\"kv_connector\":\"NixlConnector\",\"kv_role\":\"kv_both\",\"kv_buffer_device\":\"$KV_BUFFER_DEVICE\"}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same ask for adding

"kv_connector_extra_config\":{\"backends\":${VLLM_NIXL_BACKEND}}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add through a different PR, please ignore my comments above

@@ -672,6 +675,9 @@ def initialize_host_xfer_buffer(

def set_host_xfer_buffer_ops(self, copy_operation: CopyBlocksOp):
"""Assign copy (d2h, h2d) operations when host buffer is used."""
# Set a no-op if the host buffer is not cpu.
if self.kv_buffer_device != "cpu":
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok to put guard here, however, it is not very straight-forward to me for any non-cpu buffer, it will always call set_host_xfer_buffer_ops in GPU_MODEL_RUNNER, is that OK to move the condition-check to gpu_model_runner?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can refer to @njhill earlier comment, but more broadly this pair of functions make sense when the selected buffer device is cpu, not when we're running on particular platform.
And the gpu model runner doesn't need to be aware of the selected buffer device, possibly, since it's a kv connector spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved in @NickLucche comments below, please ignore

Copy link
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for contributing @chenxi-yang , let's get the tests fixed (sync to main) and fix the DCO check.

if [[ "$KV_BUFFER_DEVICE" == "cuda" ]]; then
KV_CONFIG='{"kv_connector":"NixlConnector","kv_role":"kv_both"}'
else
KV_CONFIG="{\"kv_connector\":\"NixlConnector\",\"kv_role\":\"kv_both\",\"kv_buffer_device\":\"$KV_BUFFER_DEVICE\"}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -672,6 +675,9 @@ def initialize_host_xfer_buffer(

def set_host_xfer_buffer_ops(self, copy_operation: CopyBlocksOp):
"""Assign copy (d2h, h2d) operations when host buffer is used."""
# Set a no-op if the host buffer is not cpu.
if self.kv_buffer_device != "cpu":
return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can refer to @njhill earlier comment, but more broadly this pair of functions make sense when the selected buffer device is cpu, not when we're running on particular platform.
And the gpu model runner doesn't need to be aware of the selected buffer device, possibly, since it's a kv connector spec.

Chenxi Yang added 3 commits September 24, 2025 21:37
Signed-off-by: Chenxi Yang <cxyang@fb.com>
@NickLucche NickLucche enabled auto-merge (squash) September 29, 2025 12:48
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 29, 2025
@NickLucche NickLucche merged commit d0d138b into vllm-project:main Sep 29, 2025
47 checks passed
pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025
Signed-off-by: Chenxi Yang <cxyang@fb.com>
Co-authored-by: Chenxi Yang <cxyang@fb.com>
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Signed-off-by: Chenxi Yang <cxyang@fb.com>
Co-authored-by: Chenxi Yang <cxyang@fb.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
tomeras91 pushed a commit to tomeras91/vllm that referenced this pull request Oct 6, 2025
Signed-off-by: Chenxi Yang <cxyang@fb.com>
Co-authored-by: Chenxi Yang <cxyang@fb.com>
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: Chenxi Yang <cxyang@fb.com>
Co-authored-by: Chenxi Yang <cxyang@fb.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
Signed-off-by: Chenxi Yang <cxyang@fb.com>
Co-authored-by: Chenxi Yang <cxyang@fb.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Chenxi Yang <cxyang@fb.com>
Co-authored-by: Chenxi Yang <cxyang@fb.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Chenxi Yang <cxyang@fb.com>
Co-authored-by: Chenxi Yang <cxyang@fb.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kv-connector ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants