feat: migrate requests when planner shutdown decode engine (vllm) #2280

tedzhouhk · 2025-08-04T21:10:47Z

add an option to define shutdown behavior when serving endpoint
rust change to migrate requests for python engines depends on feat: Allow Python Engine to end stream before final #2270
modify vllm's P/D worker implement and k8s disagg_planner.ymal

Summary by CodeRabbit

New Features
- Added configurable graceful shutdown for endpoints, allowing control over whether in-flight requests are completed during shutdown.
- Introduced a migration limit setting for worker components to manage workload transitions.
Bug Fixes
- Improved handling of generator termination events to provide clearer error messaging and prevent unintended completion signals when streams end prematurely.
Documentation
- Updated method and function docstrings to clarify shutdown behaviors and new parameters.

…/planner-migrate-shutdown

…ynamo/dynamo into hzhou/planner-migrate-shutdown

copy-pr-bot · 2025-08-04T21:10:50Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-08-04T21:16:26Z

Walkthrough

This change introduces configurable graceful shutdown behavior for endpoints in the vLLM backend, allowing selective waiting for in-flight requests during shutdown. It adds a --migration-limit argument to deployment YAML, updates endpoint serving logic and signatures to support the new shutdown flag, and improves error handling for Python generator exits across Python and Rust layers.

Changes

Cohort / File(s)	Change Summary
Deployment Configuration `components/backends/vllm/deploy/disagg_planner.yaml`	Added `--migration-limit=3` argument to both `VllmDecodeWorker` and `VllmPrefillWorker` command invocations.
Python Async Generator Error Handling `components/backends/vllm/src/dynamo/vllm/handlers.py`	Added explicit handling for `asyncio.CancelledError` in async generators, raising `GeneratorExit` with context-specific messages.
Endpoint Shutdown Control (Python) `components/backends/vllm/src/dynamo/vllm/main.py`	Clarified shutdown docstrings; set `graceful_shutdown=True` for prefill and `False` for decode endpoints; added debug print for `migration_limit`.
Python-Rust FFI & API Surface `lib/bindings/python/rust/engine.rs`, `lib/bindings/python/rust/lib.rs`, `lib/bindings/python/src/dynamo/_core.pyi`	Added `PyGeneratorExit` error variant; updated `serve_endpoint` to accept `graceful_shutdown` parameter (default `True`) in both Rust and Python stubs.
Endpoint and Pipeline Shutdown (Rust) `lib/runtime/src/component/endpoint.rs`, `lib/runtime/src/pipeline/network/ingress/push_endpoint.rs`	Added `graceful_shutdown` field to endpoint config and push endpoint; shutdown logic now conditionally waits for in-flight requests based on this flag.
Stream Error Propagation (Rust) `lib/runtime/src/pipeline/network/ingress/push_handler.rs`	Enhanced error handling: detects generator exit errors, prevents sending final completion messages when stream ends prematurely.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Endpoint
    participant Worker

    Client->>Endpoint: Send request
    Endpoint->>Worker: Dispatch request
    Worker-->>Endpoint: Stream responses (async)
    Endpoint-->>Client: Forward responses

    Note over Endpoint: During shutdown:
    alt graceful_shutdown = True
        Endpoint->>Worker: Wait for in-flight requests to finish
        Worker-->>Endpoint: Complete all responses
        Endpoint-->>Client: All responses delivered before shutdown
    else graceful_shutdown = False
        Endpoint-->>Client: Immediately stop accepting new requests
        Worker--x Endpoint: In-flight requests may be migrated or terminated
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat: add graceful shutdown in vllm_1 #1562: Implements graceful shutdown in the vllm_v1 worker, handling signals and runtime shutdown. Both PRs address shutdown behavior, but in different components of the vLLM system.

Poem

A rabbit hops with gentle might,
Tweaking shutdowns left and right—
With endpoints now both swift and kind,
Some wait, some leave requests behind.
Migration limits set with care,
Async streams now well aware—
The warren’s code runs smooth and bright! 🐇✨

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

components/backends/vllm/src/dynamo/vllm/main.py (1)
149-151: Replace debug print with proper logging.

The debug print statements with excessive exclamation marks appear temporary and are not suitable for production code. Consider using the existing logger instead.
-        print("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
-        print(f"Migration limit: {config.migration_limit}")
-        print("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
+        logger.info(f"Migration limit: {config.migration_limit}")

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 26dc628 and c29a591.

📒 Files selected for processing (9)

components/backends/vllm/deploy/disagg_planner.yaml (2 hunks)
components/backends/vllm/src/dynamo/vllm/handlers.py (2 hunks)
components/backends/vllm/src/dynamo/vllm/main.py (4 hunks)
lib/bindings/python/rust/engine.rs (3 hunks)
lib/bindings/python/rust/lib.rs (1 hunks)
lib/bindings/python/src/dynamo/_core.pyi (1 hunks)
lib/runtime/src/component/endpoint.rs (3 hunks)
lib/runtime/src/pipeline/network/ingress/push_endpoint.rs (2 hunks)
lib/runtime/src/pipeline/network/ingress/push_handler.rs (3 hunks)

🧰 Additional context used

🧠 Learnings (10)

📓 Common learnings

Learnt from: julienmancuso
PR: ai-dynamo/dynamo#2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.534Z
Learning: Kubernetes v1.33 introduced the stopSignal field as part of the official container lifecycle specification, allowing customization of termination signals without rebuilding container images. This field is legitimately placed under lifecycle and is autogenerated correctly by controller-gen when upgrading from older Kubernetes API versions.

Learnt from: nnshah1
PR: ai-dynamo/dynamo#2124
File: components/backends/vllm/deploy/disagg.yaml:54-60
Timestamp: 2025-07-25T22:34:11.384Z
Learning: In vLLM worker deployments, startup probes (with longer periods and higher failure thresholds like periodSeconds: 10, failureThreshold: 60) are used to handle the slow model loading startup phase, while liveness probes are intentionally kept aggressive (periodSeconds: 5, failureThreshold: 1) for quick failure detection once the worker is operational. This pattern separates startup concerns from operational health monitoring in GPU-heavy workloads.

📚 Learning: the `create_endpoint` method in `workermetricspublisher` has backward compatibility maintained throu...

Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1392
File: launch/dynamo-run/src/subprocess/vllm_v1_inc.py:71-71
Timestamp: 2025-06-05T01:04:24.775Z
Learning: The `create_endpoint` method in `WorkerMetricsPublisher` has backward compatibility maintained through pyo3 signature annotation `#[pyo3(signature = (component, dp_rank = None))]`, making the `dp_rank` parameter optional with a default value of `None`.

Applied to files:

lib/runtime/src/component/endpoint.rs
lib/bindings/python/src/dynamo/_core.pyi
lib/bindings/python/rust/lib.rs

📚 Learning: the sglang `async_encode` method does not support streaming options, so collecting all embeddings be...

Learnt from: t-ob
PR: ai-dynamo/dynamo#1290
File: launch/dynamo-run/src/subprocess/sglang_inc.py:80-110
Timestamp: 2025-06-03T10:17:51.711Z
Learning: The sglang `async_encode` method does not support streaming options, so collecting all embeddings before yielding is the correct approach for embedding requests.

Applied to files:

components/backends/vllm/src/dynamo/vllm/handlers.py
components/backends/vllm/src/dynamo/vllm/main.py

📚 Learning: the asyncenginecontextprovider trait in lib/runtime/src/engine.rs was intentionally changed from `se...

Learnt from: ryanolson
PR: ai-dynamo/dynamo#1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from `Send + Sync + Debug` to `Send + Debug` because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.

Applied to files:

lib/runtime/src/pipeline/network/ingress/push_handler.rs
lib/bindings/python/rust/engine.rs
lib/bindings/python/rust/lib.rs

📚 Learning: in async-nats, the "no responders" error is represented as async_nats::client::requesterrorkind::nor...

Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:32:05.022Z
Learning: In async-nats, the "no responders" error is represented as async_nats::client::RequestErrorKind::NoResponders, not async_nats::Error::NoResponders. Use err.downcast_ref::<async_nats::client::RequestError>() and then check request_err.kind() against RequestErrorKind::NoResponders.

Applied to files:

lib/runtime/src/pipeline/network/ingress/push_handler.rs

📚 Learning: in lib/llm/src/kv_router/scoring.rs, peabrane prefers panic-based early failure over result-based er...

Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1392
File: lib/llm/src/kv_router/scoring.rs:35-46
Timestamp: 2025-06-05T01:02:15.318Z
Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.

Applied to files:

lib/runtime/src/pipeline/network/ingress/push_handler.rs

📚 Learning: the codebase uses async-nats version 0.40, not the older nats crate. error handling should use async...

Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.

Applied to files:

lib/runtime/src/pipeline/network/ingress/push_handler.rs
lib/bindings/python/rust/engine.rs

📚 Learning: in async-nats, the "no responders" error is represented as async_nats::error::requesterrorkind::nore...

Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:32:05.022Z
Learning: In async-nats, the "no responders" error is represented as async_nats::error::RequestErrorKind::NoResponders. Use err.downcast_ref::<async_nats::error::RequestError>() and then check req_err.kind() against RequestErrorKind::NoResponders to handle this error properly.

Applied to files:

lib/runtime/src/pipeline/network/ingress/push_handler.rs

📚 Learning: in lib/llm/src/kv_router/scoring.rs, the user prefers to keep the panic behavior when calculating lo...

Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1285
File: lib/llm/src/kv_router/scoring.rs:58-63
Timestamp: 2025-05-30T06:38:09.630Z
Learning: In lib/llm/src/kv_router/scoring.rs, the user prefers to keep the panic behavior when calculating load_avg and variance with empty endpoints rather than adding guards for division by zero. They want the code to fail fast on this error condition.

Applied to files:

lib/runtime/src/pipeline/network/ingress/push_handler.rs

📚 Learning: the `@dynamo_worker()` decorator in the dynamo codebase returns a wrapper that automatically injects...

Learnt from: nnshah1
PR: ai-dynamo/dynamo#1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The `@dynamo_worker()` decorator in the dynamo codebase returns a wrapper that automatically injects the `runtime` parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature `async def get_metrics(runtime, log_dir)` decorated with `@dynamo_worker()` can be called as `get_metrics(log_dir)` because the decorator wrapper injects the runtime parameter.

Applied to files:

components/backends/vllm/src/dynamo/vllm/main.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Build and Test - vllm
GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: pre-merge-rust (.)

🔇 Additional comments (21)

components/backends/vllm/deploy/disagg_planner.yaml (2)

193-193: LGTM! Migration limit configuration added correctly.

The --migration-limit=3 parameter is properly added to the VllmDecodeWorker command line arguments, aligning with the PR objective to introduce configurable graceful shutdown behavior.

243-243: LGTM! Consistent migration limit configuration.

The --migration-limit=3 parameter is properly added to the VllmPrefillWorker, maintaining consistency with the VllmDecodeWorker configuration.

lib/runtime/src/component/endpoint.rs (3)

44-46: LGTM! Well-structured graceful shutdown configuration.

The graceful_shutdown field is properly added with clear documentation and a sensible default value of true. The builder pattern integration follows the existing conventions.

62-62: LGTM! Proper field extraction following established patterns.

The graceful_shutdown field is correctly extracted using the dissolve pattern, maintaining consistency with the existing codebase structure.

116-116: LGTM! Proper integration with PushEndpoint builder.

The graceful_shutdown flag is correctly passed to the PushEndpoint builder, ensuring the configuration flows through to the endpoint implementation.

lib/bindings/python/src/dynamo/_core.pyi (1)

219-227: LGTM! Type stub properly updated with backward compatibility.

The serve_endpoint method signature is correctly updated with the optional graceful_shutdown: bool = True parameter. The documentation is clear and maintains consistency with the existing style. The default value ensures backward compatibility.

lib/bindings/python/rust/lib.rs (2)

478-491: LGTM! Proper pyo3 parameter handling with sensible defaults.

The graceful_shutdown parameter is correctly implemented using pyo3 conventions:

Proper signature annotation with default value

Option type for optional parameter handling

Appropriate default value fallback using unwrap_or(true)

This maintains backward compatibility while enabling explicit control over graceful shutdown behavior.

493-493: LGTM! Proper builder chain integration.

The graceful_shutdown flag is correctly integrated into the builder chain, ensuring the configuration flows through to the underlying endpoint implementation.

lib/runtime/src/pipeline/network/ingress/push_endpoint.rs (2)

34-35: LGTM! Consistent struct field addition.

The graceful_shutdown field is properly added with appropriate builder default value and follows the established codebase patterns.

121-133: LGTM! Robust conditional shutdown implementation.

The graceful shutdown logic is well-implemented with:

Thread-safe atomic operations for tracking inflight requests

Proper coordination using notify/wait pattern

Clear logging for both graceful and immediate shutdown paths

Sensible conditional logic that respects the graceful_shutdown flag

This provides the desired flexibility in shutdown behavior while maintaining system reliability.

components/backends/vllm/src/dynamo/vllm/handlers.py (2)

53-80: LGTM! Proper cancellation handling for graceful shutdown.

The try-catch block correctly handles asyncio.CancelledError during token generation and converts it to a GeneratorExit with a descriptive message. This aligns with the broader graceful shutdown mechanism and will be properly propagated through the Rust error handling layer.

182-199: LGTM! Consistent cancellation handling for prefill workers.

The implementation mirrors the decode worker pattern with appropriate prefill-specific messaging. The comment explaining that prefill requests cannot be migrated provides valuable context for the error handling behavior.

lib/bindings/python/rust/engine.rs (3)

137-138: LGTM! New error variant for Python generator exit.

The PyGeneratorExit(String) variant is properly added to the ResponseProcessingError enum, maintaining consistency with existing error handling patterns.

231-233: LGTM! Consistent error message for downstream detection.

The hardcoded message "Stream ended before generation completed" provides a consistent way for downstream components to detect generator exit conditions, as referenced in the push handler logic.

285-294: LGTM! Proper Python exception type detection.

The implementation correctly uses PyO3's is_instance_of to distinguish between GeneratorExit and other Python exceptions, ensuring proper error categorization for downstream handling.

lib/runtime/src/pipeline/network/ingress/push_handler.rs (3)

17-17: LGTM! Import for error inspection capability.

The MaybeError trait import enables error checking on response items in the stream processing logic.

109-109: LGTM! Trait bound for error checking.

Adding the MaybeError trait bound to generic type U enables inspection of response errors in the stream processing loop.

224-231: LGTM! Proper stream termination handling.

The logic correctly detects the "Stream ended before generation completed" error and appropriately suppresses the final completion message. The warning log provides good visibility into this shutdown scenario.

components/backends/vllm/src/dynamo/vllm/main.py (3)

33-36: LGTM! Clear documentation of shutdown behavior.

The updated docstring clearly explains how the graceful_shutdown flag affects endpoint behavior during shutdown, improving code maintainability and understanding.

116-120: LGTM! Appropriate graceful shutdown for prefill workers.

The configuration correctly sets graceful_shutdown=True for prefill endpoints with clear justification: prefill requests cannot be re-routed and should complete quickly due to their nature.

198-200: LGTM! Appropriate non-graceful shutdown for decode workers.

The configuration correctly sets graceful_shutdown=False for decode endpoints with clear justification: decode requests support migration and can be long-running, making immediate shutdown with request transfer the preferred approach.

components/backends/vllm/src/dynamo/vllm/main.py

kthui

The correct GeneratorExit exception is raised when the request needs to be migrated to another instance.

Note: The migration requires #2270 to work.

Co-authored-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>

…i-dynamo/dynamo into hzhou/planner-migrate-shutdown

…/planner-migrate-shutdown

components/backends/vllm/src/dynamo/vllm/handlers.py

) Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com> Co-authored-by: Jacky <18255193+kthui@users.noreply.github.com> Co-authored-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com>

tedzhouhk and others added 9 commits July 30, 2025 15:14

stage

4d597e2

change to engine shutdown error

a6a79be

quit silently

510cc77

raise GeneratorExit

0552068

[WIP] Allow PyGeneratorExit to end stream without final flag

be1a631

Merge branch 'main' of https://github.com/ai-dynamo/dynamo into hzhou…

1dbf28b

…/planner-migrate-shutdown

Merge branch 'jacky-ft-migrate-py-migrate' of https://github.com/ai-d…

32b0238

…ynamo/dynamo into hzhou/planner-migrate-shutdown

update knobs

97cd70c

pc

c29a591

tedzhouhk requested review from a team, GuanLuo, PeaBrane, alec-flowers, biswapanda, grahamking, ishandhanani, jthomson04, kkranen, nnshah1, paulhendricks, piotrm-nvidia, ptarasiewiczNV, rmccorm4, ryanolson, tanmayv25 and tmonty12 as code owners August 4, 2025 21:10

pull-request-size bot added the size/L label Aug 4, 2025

github-actions bot added the feat label Aug 4, 2025

coderabbitai bot reviewed Aug 4, 2025

View reviewed changes

kthui mentioned this pull request Aug 4, 2025

feat: Allow Python Engine to end stream before final #2270

Merged

hhzhang16 reviewed Aug 4, 2025

View reviewed changes

components/backends/vllm/src/dynamo/vllm/main.py Outdated Show resolved Hide resolved

components/backends/vllm/src/dynamo/vllm/main.py Outdated Show resolved Hide resolved

kthui approved these changes Aug 4, 2025

View reviewed changes

tedzhouhk and others added 4 commits August 4, 2025 21:27

Update components/backends/vllm/src/dynamo/vllm/main.py

623636c

Co-authored-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>

fmt

49413f4

Merge branch 'hzhou/planner-migrate-shutdown' of https://github.com/a…

e408889

…i-dynamo/dynamo into hzhou/planner-migrate-shutdown

fmt

216e5d1

hhzhang16 approved these changes Aug 5, 2025

View reviewed changes

Merge branch 'main' of https://github.com/ai-dynamo/dynamo into hzhou…

66a9217

…/planner-migrate-shutdown

rmccorm4 reviewed Aug 5, 2025

View reviewed changes

components/backends/vllm/src/dynamo/vllm/handlers.py Show resolved Hide resolved

tedzhouhk mentioned this pull request Aug 5, 2025

[FEATURE]: Request Migration when Decode Worker Failed/Shutdown #2310

Open

rmccorm4 approved these changes Aug 5, 2025

View reviewed changes

tedzhouhk merged commit 36c4ef5 into main Aug 5, 2025
10 checks passed

tedzhouhk deleted the hzhou/planner-migrate-shutdown branch August 5, 2025 19:24

coderabbitai bot mentioned this pull request Aug 7, 2025

feat: Request Migration when Decode Worker Failed/Shutdown (sglang) #2352

Closed

This was referenced Aug 21, 2025

feat: delay python stream until yield #2592

Closed

feat: Shutdown DRT when vLLM engine fails #2698

Merged

This was referenced Sep 8, 2025

feat: Canary Health Check. #2903

Merged

HttpAsyncEngine awaits the first result from PythonAsyncEngine to check for errors #2974

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: migrate requests when planner shutdown decode engine (vllm) #2280

feat: migrate requests when planner shutdown decode engine (vllm) #2280

Uh oh!

tedzhouhk commented Aug 4, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Aug 4, 2025

Uh oh!

coderabbitai bot commented Aug 4, 2025

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

kthui left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feat: migrate requests when planner shutdown decode engine (vllm) #2280

feat: migrate requests when planner shutdown decode engine (vllm) #2280

Uh oh!

Conversation

tedzhouhk commented Aug 4, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Aug 4, 2025

Uh oh!

coderabbitai bot commented Aug 4, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kthui left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tedzhouhk commented Aug 4, 2025 •

edited by coderabbitai bot

Loading

kthui left a comment •

edited

Loading