Skip to content

Conversation

@Csrayz
Copy link
Contributor

@Csrayz Csrayz commented Aug 30, 2025

Purpose

Fix several issues with p2p xPyD in GET type:

  1. When using GET mode, such a situation can occur. The GPU buffer size configuration is just at the critical value of the KV Cache size. Some requests larger than the buffer size will clear all tensors, causing next(iter(self.send_store)) to raise StopIteration.
  2. When calling recv_tensor, the remote_address of the p node is missing from the parameters, which causes kv_cache to always be None.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Csrayz added 3 commits August 30, 2025 23:11
…ensor

Signed-off-by: Csrayz <jover@cmbchina.com>
Signed-off-by: Csrayz <jover@cmbchina.com>
buffer_size_threshold

This can avoid clearing the buffer due to the tensor being too large.

Signed-off-by: Csrayz <jover@cmbchina.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses a crash caused by an empty send_store during tensor eviction in GET mode. The core of the fix is the introduction of a pre-check that rejects tensors larger than the buffer threshold, which correctly prevents the problematic eviction loop. The addition of an assert to ensure the store is not empty during eviction is a good defensive measure. Furthermore, the code is cleaner due to the correction of several typos in variable names. The changes are logical and correctly resolve the bug.

@Csrayz
Copy link
Contributor Author

Csrayz commented Aug 30, 2025

@Abatom Please review.

Additionally, I have a question. While we understand the GET type is not recommended due to performance concerns, I've observed that when using the GET type, TensorMemoryPool is not utilized as a secondary cache for GPU buffers. Furthermore, in this configuration, TensorMemoryPool is still instantiated, consuming some memory. Could you please let me know if there are any plans to implement TensorMemoryPool as a secondary cache for GPU buffers in future releases?

@Abatom
Copy link
Contributor

Abatom commented Aug 31, 2025

Thanks, I'll review this PR shortly.

@Abatom
Copy link
Contributor

Abatom commented Aug 31, 2025

@Abatom Please review.

Additionally, I have a question. While we understand the GET type is not recommended due to performance concerns, I've observed that when using the GET type, TensorMemoryPool is not utilized as a secondary cache for GPU buffers. Furthermore, in this configuration, TensorMemoryPool is still instantiated, consuming some memory. Could you please let me know if there are any plans to implement TensorMemoryPool as a secondary cache for GPU buffers in future releases?

Currently, the memory pool size can be reduced via the mem_pool_size_gb configuration. Because the performance of the GET model is inferior to that of PUT_ASYNC, we haven’t prioritized letting P instances use the memory pool yet; we’ll add this capability when time allows.

Signed-off-by: ivyilike <pww123@cmbchina.com>
@Csrayz Csrayz changed the title [Bugfix] Fix the program crash caused by send_store being empty. [Bugfix] Fix several issues with p2p xPyD in GET type Sep 1, 2025
Signed-off-by: ivyilike <pww123@cmbchina.com>
@Csrayz
Copy link
Contributor Author

Csrayz commented Sep 3, 2025

any update?

@Abatom
Copy link
Contributor

Abatom commented Sep 3, 2025

@Csrayz Have you ever run the GET mode locally yourself?

Have you stress-tested it?

Did you encounter any garbled text?

During chunked prefill, was there any corruption?

When pre-emption happened, did you see garbled output or crashes?

If everything above looks good, just let me know and I’ll run the tests locally.

@Csrayz
Copy link
Contributor Author

Csrayz commented Sep 4, 2025

  1. The code was submitted only after running normally locally. The basic environment is NVIDIA A10 x4.
  2. We used vllm benchmark_server.py and tested for different request rates, including low load and overload situations.
  3. We did not encounter garbled characters when making curl requests. When using benchmark_server, we did not pay attention to the response content.
  4. No data corruption was found during chunked prefilling, and we did not notice if there was any preemption during the testing process.
  5. There was a problem in the original GET method that prevented KV Transfer between PDs. After fixing it, no crashes were found through curl and script testing.

@Abatom
Copy link
Contributor

Abatom commented Sep 4, 2025

@Csrayz I'll run this PR.

Signed-off-by: Csrayz <jover@cmbchina.com>
@Abatom
Copy link
Contributor

Abatom commented Sep 11, 2025

@Csrayz thanks, LGTM, I run this PR, it works, cc @simon-mo

@Csrayz Csrayz requested a review from NickLucche as a code owner September 15, 2025 14:18
@Csrayz
Copy link
Contributor Author

Csrayz commented Sep 16, 2025

All work is complete, waiting for your review. @NickLucche

Copy link
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I would just point having tests here would go a long way in ensuring functionality for these changes.

@mergify mergify bot added the kv-connector label Sep 18, 2025
@Csrayz
Copy link
Contributor Author

Csrayz commented Sep 22, 2025

Yes, unit tests are needed to ensure that the program can correctly handle boundary cases. However, recent work has been quite busy, and I may need to supplement the relevant unit tests separately at a later time.

@NickLucche
Copy link
Collaborator

I think this is fine for now since it's a bugfix, but please review it at a later time when it's most suitable to you. Let's get this merged.
Thanks for contributing @Csrayz !

cc @Abatom

@NickLucche NickLucche enabled auto-merge (squash) September 22, 2025 13:05
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 22, 2025
@NickLucche NickLucche merged commit c10101a into vllm-project:main Sep 22, 2025
59 checks passed
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
…3993)

Signed-off-by: Csrayz <jover@cmbchina.com>
Signed-off-by: ivyilike <pww123@cmbchina.com>
Co-authored-by: ivyilike <pww123@cmbchina.com>
charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025
…3993)

Signed-off-by: Csrayz <jover@cmbchina.com>
Signed-off-by: ivyilike <pww123@cmbchina.com>
Co-authored-by: ivyilike <pww123@cmbchina.com>
Signed-off-by: charlifu <charlifu@amd.com>
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Signed-off-by: Csrayz <jover@cmbchina.com>
Signed-off-by: ivyilike <pww123@cmbchina.com>
Co-authored-by: ivyilike <pww123@cmbchina.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
gjc0824 pushed a commit to gjc0824/vllm that referenced this pull request Oct 10, 2025
…3993)

Signed-off-by: Csrayz <jover@cmbchina.com>
Signed-off-by: ivyilike <pww123@cmbchina.com>
Co-authored-by: ivyilike <pww123@cmbchina.com>
Signed-off-by: gaojc <1055866782@qq.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…3993)

Signed-off-by: Csrayz <jover@cmbchina.com>
Signed-off-by: ivyilike <pww123@cmbchina.com>
Co-authored-by: ivyilike <pww123@cmbchina.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
…3993)

Signed-off-by: Csrayz <jover@cmbchina.com>
Signed-off-by: ivyilike <pww123@cmbchina.com>
Co-authored-by: ivyilike <pww123@cmbchina.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…3993)

Signed-off-by: Csrayz <jover@cmbchina.com>
Signed-off-by: ivyilike <pww123@cmbchina.com>
Co-authored-by: ivyilike <pww123@cmbchina.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…3993)

Signed-off-by: Csrayz <jover@cmbchina.com>
Signed-off-by: ivyilike <pww123@cmbchina.com>
Co-authored-by: ivyilike <pww123@cmbchina.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kv-connector ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants