-
Notifications
You must be signed in to change notification settings - Fork 544
[Bugfix][PD] Auto-clear producer KV cache if no pull notification #2174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix][PD] Auto-clear producer KV cache if no pull notification #2174
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (20.00%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #2174 +/- ##
==========================================
- Coverage 76.17% 75.96% -0.22%
==========================================
Files 112 112
Lines 12459 12506 +47
==========================================
+ Hits 9491 9500 +9
- Misses 2968 3006 +38
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
…tion (#2085) ### What this PR does / why we need it? This PR addresses a critical issue where Node D (Device) failures cause Node P (Processor) to hang due to inability to release KV cache. **Trigger Scenarios:** 1. Node D fails mid-inference (e.g., network disconnection) 2. Node D rejects requests at a certain stage (e.g., via API server) 3. Load-test script termination causes Node P or D to abort queued requests **Root Cause Analysis:** 1. Currently, Node D sends a "KV cache pull complete, release approved" message to Node P 2. This message is transmitted via the worker connector. If PD connection breaks or requests are rejected upstream, Node D cannot send the message 3. Node P will never release KV cache without receiving this message **Solution:** Following VLLM community's approach (NIXL connector timeout mechanism), we're implementing: - A timeout mechanism with comprehensive warnings - Updated README documentation - Reference: VLLM's optimization PR [#20139](vllm-project/vllm#20139) **Note:** The full disaster recovery solution is still in design. This PR will be merged into v091-dev branch simply but will evolve in main ([PR #2174](#2174)). ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? --------- Signed-off-by: underfituu <hzhucong@163.com>
|
approved |
|
please rebase and make CI happy |
Signed-off-by: underfituu <hzhucong@163.com>
32838d3 to
79c1835
Compare
Signed-off-by: underfituu <hzhucong@163.com>
Signed-off-by: underfituu <hzhucong@163.com>
Signed-off-by: underfituu <hzhucong@163.com>
…lm-project#2174) ### What this PR does / why we need it? This PR addresses a critical issue where Node D (Device) failures cause Node P (Processor) to hang due to inability to release KV cache. **Trigger Scenarios:** 1. Node D fails mid-inference (e.g., network disconnection) 2. Node D rejects requests at a certain stage (e.g., via API server) 3. Load-test script termination causes Node P or D to abort queued requests **Root Cause Analysis:** 1. Currently, Node D sends a "KV cache pull complete, release approved" message to Node P 2. This message is transmitted via the worker connector. If PD connection breaks or requests are rejected upstream, Node D cannot send the message 3. Node P will never release KV cache without receiving this message **Solution:** Following VLLM community's approach (NIXL connector timeout mechanism), we're implementing: - A timeout mechanism with comprehensive warnings - Updated README documentation - Reference: VLLM's optimization PR [#20139](vllm-project/vllm#20139) ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? None - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@9607d5e --------- Signed-off-by: underfituu <hzhucong@163.com>
What this PR does / why we need it?
This PR addresses a critical issue where Node D (Device) failures cause Node P (Processor) to hang due to inability to release KV cache.
Trigger Scenarios:
Root Cause Analysis:
Solution:
Following VLLM community's approach (NIXL connector timeout mechanism), we're implementing:
Does this PR introduce any user-facing change?
None
How was this patch tested?
None