Skip to content

Conversation

@underfituu
Copy link
Contributor

@underfituu underfituu commented Aug 2, 2025

What this PR does / why we need it?

This PR addresses a critical issue where Node D (Device) failures cause Node P (Processor) to hang due to inability to release KV cache.

Trigger Scenarios:

  1. Node D fails mid-inference (e.g., network disconnection)
  2. Node D rejects requests at a certain stage (e.g., via API server)
  3. Load-test script termination causes Node P or D to abort queued requests

Root Cause Analysis:

  1. Currently, Node D sends a "KV cache pull complete, release approved" message to Node P
  2. This message is transmitted via the worker connector. If PD connection breaks or requests are rejected upstream, Node D cannot send the message
  3. Node P will never release KV cache without receiving this message

Solution:
Following VLLM community's approach (NIXL connector timeout mechanism), we're implementing:

  • A timeout mechanism with comprehensive warnings
  • Updated README documentation
  • Reference: VLLM's optimization PR #20139

Does this PR introduce any user-facing change?

None

How was this patch tested?

None

@underfituu underfituu changed the title auto_clear_kv_cache [Bugfix][PD] Auto-clear producer KV cache if no pull notification #2085 Aug 2, 2025
@github-actions
Copy link

github-actions bot commented Aug 2, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@underfituu underfituu changed the title [Bugfix][PD] Auto-clear producer KV cache if no pull notification #2085 [Bugfix][PD] Auto-clear producer KV cache if no pull notification Aug 2, 2025
@codecov
Copy link

codecov bot commented Aug 2, 2025

Codecov Report

❌ Patch coverage is 20.00000% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.96%. Comparing base (6e00aed) to head (32838d3).
⚠️ Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
..._ascend/distributed/llmdatadist_c_mgr_connector.py 20.00% 44 Missing ⚠️

❌ Your patch check has failed because the patch coverage (20.00%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2174      +/-   ##
==========================================
- Coverage   76.17%   75.96%   -0.22%     
==========================================
  Files         112      112              
  Lines       12459    12506      +47     
==========================================
+ Hits         9491     9500       +9     
- Misses       2968     3006      +38     
Flag Coverage Δ
unittests 75.96% <20.00%> (-0.22%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Aug 4, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

ganyi1996ppo pushed a commit that referenced this pull request Aug 5, 2025
…tion (#2085)

### What this PR does / why we need it?

This PR addresses a critical issue where Node D (Device) failures cause
Node P (Processor) to hang due to inability to release KV cache.

**Trigger Scenarios:**  
1. Node D fails mid-inference (e.g., network disconnection)  
2. Node D rejects requests at a certain stage (e.g., via API server)  
3. Load-test script termination causes Node P or D to abort queued
requests

**Root Cause Analysis:**  
1. Currently, Node D sends a "KV cache pull complete, release approved"
message to Node P
2. This message is transmitted via the worker connector. If PD
connection breaks or requests are rejected upstream, Node D cannot send
the message
3. Node P will never release KV cache without receiving this message  

**Solution:**  
Following VLLM community's approach (NIXL connector timeout mechanism),
we're implementing:
- A timeout mechanism with comprehensive warnings  
- Updated README documentation  
- Reference: VLLM's optimization PR
[#20139](vllm-project/vllm#20139)

**Note:** The full disaster recovery solution is still in design. This
PR will be merged into v091-dev branch simply but will evolve in main
([PR #2174](#2174)).


### Does this PR introduce _any_ user-facing change?


### How was this patch tested?


---------

Signed-off-by: underfituu <hzhucong@163.com>
@weijinqian0
Copy link
Collaborator

approved

@wangxiyuan
Copy link
Collaborator

please rebase and make CI happy

Signed-off-by: underfituu <hzhucong@163.com>
Signed-off-by: underfituu <hzhucong@163.com>
Signed-off-by: underfituu <hzhucong@163.com>
Signed-off-by: underfituu <hzhucong@163.com>
@wangxiyuan wangxiyuan added ready read for review ready-for-test start test by label for PR labels Sep 22, 2025
@wangxiyuan wangxiyuan merged commit 8dd53c8 into vllm-project:main Sep 23, 2025
50 checks passed
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…lm-project#2174)

### What this PR does / why we need it?

This PR addresses a critical issue where Node D (Device) failures cause
Node P (Processor) to hang due to inability to release KV cache.

**Trigger Scenarios:**  
1. Node D fails mid-inference (e.g., network disconnection)  
2. Node D rejects requests at a certain stage (e.g., via API server)  
3. Load-test script termination causes Node P or D to abort queued
requests

**Root Cause Analysis:**  
1. Currently, Node D sends a "KV cache pull complete, release approved"
message to Node P
2. This message is transmitted via the worker connector. If PD
connection breaks or requests are rejected upstream, Node D cannot send
the message
3. Node P will never release KV cache without receiving this message  

**Solution:**  
Following VLLM community's approach (NIXL connector timeout mechanism),
we're implementing:
- A timeout mechanism with comprehensive warnings  
- Updated README documentation  
- Reference: VLLM's optimization PR
[#20139](vllm-project/vllm#20139)
### Does this PR introduce _any_ user-facing change?
None
### How was this patch tested?
None


- vLLM version: v0.10.2
- vLLM main:
vllm-project/vllm@9607d5e

---------

Signed-off-by: underfituu <hzhucong@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:core ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants