Skip to content

Conversation

@kthui
Copy link
Contributor

@kthui kthui commented Aug 26, 2025

Overview:

Abort vLLM request if the stream context is moved to stopped / killed.

Details:

N/A

Where should the reviewer start?

N/A

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

N/A

Summary by CodeRabbit

  • New Features

    • Per-request context propagation across the stack enables precise cancellation and early termination of streams.
    • Hierarchical cascading: stopping/killing a parent context now stops linked child operations.
    • Generation and routing paths accept an optional context; streaming halts promptly when cancelled.
  • Refactor

    • Python API: PyContext renamed to Context. Client methods now accept context=None, and generators may receive a context argument.
  • Tests

    • Added end-to-end cancellation suite covering client-initiated, server-initiated, and exception-driven cancellations.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 26, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the feat label Aug 26, 2025
@kthui kthui changed the base branch from main to ft-request-cancel-0.5.0 August 26, 2025 19:45
@kthui kthui merged commit b4603fa into ft-request-cancel-0.5.0 Aug 26, 2025
12 of 13 checks passed
@kthui kthui deleted the jacky-ft-vllm-cancel branch August 26, 2025 19:50
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 26, 2025

Caution

Review failed

Failed to post review comments.

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 36df35e and 80e7169.

📒 Files selected for processing (17)
  • components/backends/vllm/src/dynamo/vllm/handlers.py (5 hunks)
  • lib/bindings/python/rust/context.rs (1 hunks)
  • lib/bindings/python/rust/engine.rs (6 hunks)
  • lib/bindings/python/rust/lib.rs (10 hunks)
  • lib/bindings/python/src/dynamo/runtime/__init__.py (1 hunks)
  • lib/bindings/python/tests/test_cancellation/conftest.py (1 hunks)
  • lib/bindings/python/tests/test_cancellation/test_cancellation.py (1 hunks)
  • lib/bindings/python/tests/test_cancellation/test_client_context_cancel.py (1 hunks)
  • lib/bindings/python/tests/test_cancellation/test_client_loop_break.py (1 hunks)
  • lib/bindings/python/tests/test_cancellation/test_server_context_cancel.py (1 hunks)
  • lib/bindings/python/tests/test_cancellation/test_server_raise_cancelled.py (1 hunks)
  • lib/llm/src/http/client.rs (8 hunks)
  • lib/llm/src/migration.rs (18 hunks)
  • lib/llm/src/perf.rs (1 hunks)
  • lib/llm/src/perf/logprobs.rs (1 hunks)
  • lib/runtime/src/engine.rs (1 hunks)
  • lib/runtime/src/pipeline/context.rs (4 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-07-14T21:25:56.930Z
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from `Send + Sync + Debug` to `Send + Debug` because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.

Applied to files:

  • lib/llm/src/perf.rs
  • lib/llm/src/perf/logprobs.rs
📚 Learning: 2025-06-17T00:50:44.845Z
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1236
File: lib/llm/src/mocker/engine.rs:140-161
Timestamp: 2025-06-17T00:50:44.845Z
Learning: In Rust async code, when an Arc<Mutex<_>> is used solely to transfer ownership of a resource (like a channel receiver) into a spawned task rather than for sharing between multiple tasks, holding the mutex lock across an await is not problematic since there's no actual contention.

Applied to files:

  • lib/llm/src/http/client.rs
🧬 Code graph analysis (16)
lib/bindings/python/tests/test_cancellation/test_client_loop_break.py (2)
lib/bindings/python/tests/test_cancellation/conftest.py (3)
  • server (166-196)
  • client (200-207)
  • generate (36-46)
lib/bindings/python/rust/engine.rs (2)
  • generate (107-109)
  • generate (162-295)
lib/runtime/src/engine.rs (4)
lib/llm/src/http/client.rs (2)
  • link_child (204-209)
  • child (96-104)
lib/llm/src/perf.rs (1)
  • link_child (556-558)
lib/llm/src/perf/logprobs.rs (1)
  • link_child (1617-1619)
lib/runtime/src/pipeline/context.rs (2)
  • link_child (304-306)
  • link_child (417-422)
lib/bindings/python/tests/test_cancellation/test_client_context_cancel.py (5)
lib/bindings/python/rust/lib.rs (4)
  • _core (65-123)
  • client (546-560)
  • generate (702-714)
  • data (958-960)
lib/bindings/python/tests/test_cancellation/test_cancellation.py (1)
  • test_client_context_cancel (43-44)
lib/bindings/python/tests/test_cancellation/conftest.py (3)
  • server (166-196)
  • client (200-207)
  • generate (36-46)
lib/bindings/python/rust/engine.rs (2)
  • generate (107-109)
  • generate (162-295)
lib/bindings/python/rust/context.rs (1)
  • stop_generating (54-56)
lib/llm/src/perf.rs (4)
lib/llm/src/http/client.rs (1)
  • link_child (204-209)
lib/llm/src/perf/logprobs.rs (1)
  • link_child (1617-1619)
lib/runtime/src/engine.rs (1)
  • link_child (167-167)
lib/runtime/src/pipeline/context.rs (2)
  • link_child (304-306)
  • link_child (417-422)
lib/bindings/python/tests/test_cancellation/test_server_raise_cancelled.py (2)
lib/bindings/python/tests/test_cancellation/conftest.py (3)
  • server (166-196)
  • client (200-207)
  • generate (36-46)
lib/bindings/python/rust/engine.rs (3)
  • generate (107-109)
  • generate (162-295)
  • e (309-309)
lib/bindings/python/tests/test_cancellation/test_cancellation.py (4)
lib/bindings/python/tests/test_cancellation/test_client_context_cancel.py (1)
  • test_client_context_cancel (24-52)
lib/bindings/python/tests/test_cancellation/test_client_loop_break.py (1)
  • test_client_loop_break (22-49)
lib/bindings/python/tests/test_cancellation/test_server_context_cancel.py (1)
  • test_server_context_cancel (21-40)
lib/bindings/python/tests/test_cancellation/test_server_raise_cancelled.py (1)
  • test_server_raise_cancelled (21-44)
lib/bindings/python/src/dynamo/runtime/__init__.py (1)
lib/bindings/python/rust/lib.rs (1)
  • _core (65-123)
lib/bindings/python/tests/test_cancellation/test_server_context_cancel.py (3)
lib/bindings/python/tests/test_cancellation/test_cancellation.py (1)
  • test_server_context_cancel (51-52)
lib/bindings/python/tests/test_cancellation/conftest.py (3)
  • server (166-196)
  • client (200-207)
  • generate (36-46)
lib/bindings/python/rust/engine.rs (3)
  • generate (107-109)
  • generate (162-295)
  • e (309-309)
components/backends/vllm/src/dynamo/vllm/handlers.py (3)
lib/bindings/python/rust/lib.rs (2)
  • generate (702-714)
  • round_robin (718-747)
lib/bindings/python/src/dynamo/_core.pyi (1)
  • round_robin (273-277)
lib/runtime/src/utils/tasks/tracker.rs (1)
  • abort (523-525)
lib/llm/src/perf/logprobs.rs (4)
lib/llm/src/http/client.rs (1)
  • link_child (204-209)
lib/llm/src/perf.rs (1)
  • link_child (556-558)
lib/runtime/src/engine.rs (1)
  • link_child (167-167)
lib/runtime/src/pipeline/context.rs (2)
  • link_child (304-306)
  • link_child (417-422)
lib/bindings/python/rust/engine.rs (1)
lib/bindings/python/rust/context.rs (2)
  • callable_accepts_kwarg (83-94)
  • new (21-23)
lib/llm/src/http/client.rs (2)
lib/runtime/src/engine.rs (4)
  • new (231-233)
  • stop_generating (153-153)
  • kill (161-161)
  • link_child (167-167)
lib/runtime/src/pipeline/context.rs (9)
  • new (42-49)
  • new (239-245)
  • new (342-350)
  • stop_generating (284-286)
  • stop_generating (395-397)
  • kill (280-282)
  • kill (408-415)
  • link_child (304-306)
  • link_child (417-422)
lib/llm/src/migration.rs (2)
lib/llm/src/local_model.rs (2)
  • display_name (319-321)
  • migration_limit (150-153)
lib/bindings/python/rust/lib.rs (1)
  • id (970-972)
lib/bindings/python/tests/test_cancellation/conftest.py (4)
lib/bindings/python/rust/lib.rs (11)
  • random (751-780)
  • _core (65-123)
  • generate (702-714)
  • shutdown (325-327)
  • namespace (311-316)
  • create_service (506-512)
  • endpoint (498-504)
  • serve_endpoint (518-544)
  • cancel (483-485)
  • client (546-560)
  • wait_for_instances (689-698)
lib/bindings/python/src/dynamo/_core.pyi (1)
  • DistributedRuntime (30-53)
lib/bindings/python/rust/engine.rs (2)
  • generate (107-109)
  • generate (162-295)
lib/bindings/python/rust/context.rs (3)
  • is_stopped (45-47)
  • is_killed (50-52)
  • stop_generating (54-56)
lib/bindings/python/rust/lib.rs (2)
lib/runtime/src/pipeline/context.rs (3)
  • context (226-228)
  • context (310-312)
  • with_id (69-76)
lib/bindings/python/rust/engine.rs (2)
  • generate (107-109)
  • generate (162-295)
lib/runtime/src/pipeline/context.rs (2)
lib/llm/src/http/client.rs (5)
  • link_child (204-209)
  • child (96-104)
  • new (73-81)
  • id (148-150)
  • kill (175-185)
lib/runtime/src/engine.rs (3)
  • link_child (167-167)
  • id (126-126)
  • kill (161-161)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: pre-merge-rust (lib/bindings/python)

Walkthrough

Adds per-request Context propagation and hierarchical cancellation across runtime, Python bindings, HTTP client, and vLLM handlers. Updates method signatures to accept context, introduces link_child on AsyncEngineContext, propagates stop/kill to children, and adds Python tests validating client/server cancellation paths and subprocess isolation.

Changes

Cohort / File(s) Summary
vLLM handlers context-aware generate
components/backends/vllm/src/dynamo/vllm/handlers.py
Handlers’ generate signatures now accept (request, context). Decode/Prefill flows check context.is_stopped()/is_killed() to abort early; prefill client call updated to pass context. Early exits and abort signaling added.
Python bindings: Context type and generator kwargs
lib/bindings/python/rust/context.rs, lib/bindings/python/rust/engine.rs, lib/bindings/python/rust/lib.rs, lib/bindings/python/src/dynamo/runtime/__init__.py
Replace PyContext with Context. Add Python ctor with optional id. Expose inner() and derive Clone. Engine passes optional context kwarg to Python generators gated by has_context. Client APIs accept context=None across generate/round_robin/random/direct/static and link child contexts. Update public exports.
Runtime engine trait and pipeline controller
lib/runtime/src/engine.rs, lib/runtime/src/pipeline/context.rs
Adds AsyncEngineContext::link_child. Controller/StreamContext maintain child_context registry via Mutex<Vec<...>>. stop/kill/stop_generating propagate to children before updating own state.
HTTP client hierarchical cancellation
lib/llm/src/http/client.rs
HttpRequestContext gains child_context and implements link_child. stop/stop_generating/kill propagate to children prior to self cancellation.
Migration flow context propagation
lib/llm/src/migration.rs
Thread context id through migration/retry: RetryManager::build(context_id, ...), requests wrapped with Context::with_id(...), stream checks engine context for early termination. Tests/mocks updated to assert id propagation.
Perf test mocks adapt to trait
lib/llm/src/perf.rs, lib/llm/src/perf/logprobs.rs
Test MockContext implements no-op link_child to satisfy updated trait.
Cancellation tests and harness
lib/bindings/python/tests/test_cancellation/*
Add fixtures (runtime, server/client, NATS/etcd) and MockServer with multiple cancellation behaviors. Four async tests covering client-driven cancel, loop break, server-driven cancel, and server raising CancelledError. Subprocess runner added for isolation.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant PyClient as Python Client
  participant Core as Rust _core Client
  participant Ctx as Context (parent)
  participant Rt as Runtime Pipeline
  participant Eng as Engine/Router
  participant Svc as Backend Handler
  participant Pref as Prefill Worker

  Note over PyClient,Core: Optional context passed (id or default)
  PyClient->>Core: generate(request, context=None|Context)
  Core->>Ctx: RsContext::with_id(...) + link_child(child)
  Core->>Rt: ServerStreamingEngine.generate(request, context)
  Rt->>Eng: forward(request, context)
  Eng->>Svc: generate(request, context)
  alt Prefill path
    Svc->>Pref: round_robin(prefill_request, context)
    Pref-->>Svc: prefill response / error
  end
  loop streaming tokens
    Svc-->>Eng: chunk
    Eng-->>Rt: chunk
    Rt-->>Core: chunk
    Core-->>PyClient: chunk
    opt Cancellation detected
      PyClient->>Ctx: stop_generating()/stop()/kill()
      Ctx->>+Core: propagate via link_child
      Core->>+Rt: propagate via link_child
      Rt->>+Eng: propagate via link_child
      Eng->>+Svc: abort/cancel
      Svc-->>Eng: abort ack
      Eng-->>Rt: terminate stream
      Rt-->>Core: terminate
      Core-->>PyClient: error/end
    end
  end

  Note over all: link_child enables cascading lifecycle to all descendants
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–90 minutes

Possibly related PRs

Poem

A context hops from node to node,
With gentle paws it shares the load.
One stop, and all its kits comply—
Streams grow quiet, tokens sigh.
In burrows deep, the links entwine,
Cancel cascades, by careful design.
🥕✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants