-
Notifications
You must be signed in to change notification settings - Fork 676
feat: vLLM abort on stream stop #2717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Caution Review failedFailed to post review comments. Configuration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (17)
🧰 Additional context used🧠 Learnings (2)📚 Learning: 2025-07-14T21:25:56.930ZApplied to files:
📚 Learning: 2025-06-17T00:50:44.845ZApplied to files:
🧬 Code graph analysis (16)lib/bindings/python/tests/test_cancellation/test_client_loop_break.py (2)
lib/runtime/src/engine.rs (4)
lib/bindings/python/tests/test_cancellation/test_client_context_cancel.py (5)
lib/llm/src/perf.rs (4)
lib/bindings/python/tests/test_cancellation/test_server_raise_cancelled.py (2)
lib/bindings/python/tests/test_cancellation/test_cancellation.py (4)
lib/bindings/python/src/dynamo/runtime/__init__.py (1)
lib/bindings/python/tests/test_cancellation/test_server_context_cancel.py (3)
components/backends/vllm/src/dynamo/vllm/handlers.py (3)
lib/llm/src/perf/logprobs.rs (4)
lib/bindings/python/rust/engine.rs (1)
lib/llm/src/http/client.rs (2)
lib/llm/src/migration.rs (2)
lib/bindings/python/tests/test_cancellation/conftest.py (4)
lib/bindings/python/rust/lib.rs (2)
lib/runtime/src/pipeline/context.rs (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
WalkthroughAdds per-request Context propagation and hierarchical cancellation across runtime, Python bindings, HTTP client, and vLLM handlers. Updates method signatures to accept context, introduces link_child on AsyncEngineContext, propagates stop/kill to children, and adds Python tests validating client/server cancellation paths and subprocess isolation. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant PyClient as Python Client
participant Core as Rust _core Client
participant Ctx as Context (parent)
participant Rt as Runtime Pipeline
participant Eng as Engine/Router
participant Svc as Backend Handler
participant Pref as Prefill Worker
Note over PyClient,Core: Optional context passed (id or default)
PyClient->>Core: generate(request, context=None|Context)
Core->>Ctx: RsContext::with_id(...) + link_child(child)
Core->>Rt: ServerStreamingEngine.generate(request, context)
Rt->>Eng: forward(request, context)
Eng->>Svc: generate(request, context)
alt Prefill path
Svc->>Pref: round_robin(prefill_request, context)
Pref-->>Svc: prefill response / error
end
loop streaming tokens
Svc-->>Eng: chunk
Eng-->>Rt: chunk
Rt-->>Core: chunk
Core-->>PyClient: chunk
opt Cancellation detected
PyClient->>Ctx: stop_generating()/stop()/kill()
Ctx->>+Core: propagate via link_child
Core->>+Rt: propagate via link_child
Rt->>+Eng: propagate via link_child
Eng->>+Svc: abort/cancel
Svc-->>Eng: abort ack
Eng-->>Rt: terminate stream
Rt-->>Core: terminate
Core-->>PyClient: error/end
end
end
Note over all: link_child enables cascading lifecycle to all descendants
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60–90 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
Overview:
Abort vLLM request if the stream context is moved to stopped / killed.
Details:
N/A
Where should the reviewer start?
N/A
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
N/A
Summary by CodeRabbit
New Features
Refactor
Tests