feat: don't modify kv scheduler states on query + more python binding #2798

PeaBrane · 2025-08-30T23:27:42Z

Overview:

As titled, if the request is just to query the best worker id, do not add the request tokens to the slot manager/tracker, otherwise it will throw the accounting off.

Also exposed (recovered) more functionalities to Python land:

query/probe best_worker_id without routing
direct a request to a worker by specifying the worker_id
get the potential engine loads (prefill tokens + active blocks) for the user to write their own routing decision

Summary by CodeRabbit

New Features
- Added support for a query_instance_id request annotation to retrieve the target worker instance and token metadata without dispatching the request or altering routing state.
Refactor
- Routing now conditionally updates internal state only when appropriate, preventing side effects from query-only requests.
Documentation
- Clarified behavior: query-like requests can avoid scheduling side effects.
Notes
- No changes to public APIs; existing integrations continue to work as before.

Signed-off-by: PeaBrane <yanrpei@gmail.com>

coderabbitai · 2025-08-30T23:34:42Z

Walkthrough

Adds an update_states flag across routing and scheduling paths, propagating into KvScheduler::schedule. Introduces query_instance_id annotation handling that bypasses routing and state updates, returning instance metadata early. Non-annotated flows continue to route as before, now optionally updating states based on the flag.

Changes

Cohort / File(s)	Summary
Router state-update gating and query_instance_id handling `lib/llm/src/kv_router.rs`	- KvRouter::find_best_match gains update_states: bool and passes it to scheduler.schedule - RouterRequest path calls find_best_match with update_states = true - PreprocessedRequest and KvPushRouter detect has_annotation("query_instance_id"); if present, call find_best_match with update_states = false and return early with worker_instance_id and token_data; otherwise proceed with routing and set estimated_prefix_hit_num_blocks
Scheduler API and conditional state updates `lib/llm/src/kv_router/scheduler.rs`	- SchedulingRequest adds pub update_states: bool - KvScheduler::schedule signature adds update_states: bool and populates it - State update via add_request is conditional on request.update_states - SchedulingRequest::respond now takes &mut self and sends via taken resp_tx

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant KvRouter
  participant Scheduler
  participant Worker

  rect rgba(230,240,255,0.5)
  note over Client,KvRouter: PreprocessedRequest path
  Client->>KvRouter: request (may have annotation)
  alt has annotation "query_instance_id"
    KvRouter->>Scheduler: find_best_match(update_states=false)
    Scheduler-->>KvRouter: worker_instance_id, token_data
    KvRouter-->>Client: early response (no routing)
  else no annotation
    KvRouter->>Scheduler: find_best_match(update_states=true)
    Scheduler-->>KvRouter: best match (instance, overlap)
    KvRouter->>Worker: route request (with estimated_prefix_hit_num_blocks)
    Worker-->>KvRouter: stream/response
    KvRouter-->>Client: forward response
  end
  end

sequenceDiagram
  autonumber
  participant KvRouter
  participant Scheduler
  participant StateStore as SchedulerState

  KvRouter->>Scheduler: schedule(..., update_states)
  Scheduler-->>KvRouter: SchedulingResponse
  alt update_states == true
    Scheduler->>StateStore: add_request(...)
    StateStore-->>Scheduler: ok
  else update_states == false
    note over Scheduler: Skip state update
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat: reduce / revert routing overheads, do not consider output tokens #2182 — Adjusts kv routing logic and scheduler interfaces, overlapping with find_best_match changes and update_states propagation.

Poem

A thump of paws, a flag set true,
I sniff the routes for something new.
Query whispers, “shh—don’t change the slate,”
I hop back quick, avoid the state.
When paths are clear, I leap and send—
Carrots queued, requests ascend. 🥕✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

lib/llm/src/kv_router/scheduler.rs (1)
31-34: Typo in user-facing error message.

"aviailable" → "available".

Apply:
-    #[error("no endpoints aviailable to route work")]
+    #[error("no endpoints available to route work")]

🧹 Nitpick comments (3)

lib/llm/src/kv_router/scheduler.rs (1)

313-315: Use thread_rng() with gen_range/gen for randomness

Workspace depends on rand v0.9.0, so rng.random_range and rng.random compile, but for consistency with the current rand API, replace with thread_rng() + gen_range/gen:
-        let mut rng = rand::rng();
-        let index = rng.random_range(0..min_keys.len());
+        let mut rng = rand::thread_rng();
+        let index = rand::Rng::gen_range(&mut rng, 0..min_keys.len());
-    let mut rng = rand::rng();
-    let sample: f64 = rng.random();
+    let mut rng = rand::thread_rng();
+    let sample: f64 = rand::Rng::gen(&mut rng);

lib/llm/src/kv_router.rs (2)

371-374: Annotation string: de-dupe and centralize to avoid typos.

Use a single constant for "query_instance_id".

Example:

+const QUERY_INSTANCE_ID: &str = "query_instance_id";
...
-                let query_instance_id = request.has_annotation("query_instance_id");
+                let query_instance_id = request.has_annotation(QUERY_INSTANCE_ID);

399-407: Minimize sensitive token logging.

Even at trace, logging raw token IDs can be sensitive and noisy.

Apply:

-                    tracing::trace!(
-                        "Tokens requested in the response through the query_instance_id annotation: {:?}",
-                        response_tokens
-                    );
+                    tracing::trace!("query_instance_id: returning worker_instance_id and token_data (len={})",
+                        request.token_ids.len());

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3f09c39 and 7c0468e.

📒 Files selected for processing (2)

lib/llm/src/kv_router.rs (5 hunks)
lib/llm/src/kv_router/scheduler.rs (4 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: PeaBrane
PR: ai-dynamo/dynamo#2756
File: lib/llm/src/kv_router/subscriber.rs:36-44
Timestamp: 2025-08-29T10:03:48.303Z
Learning: PeaBrane prefers to keep PRs contained in scope and is willing to defer technical improvements to future PRs when the current implementation works for the immediate use case. They acknowledge technical debt but prioritize deliverability over completeness in individual PRs.

📚 Learning: 2025-08-29T10:08:18.414Z

Learnt from: PeaBrane
PR: ai-dynamo/dynamo#2756
File: lib/bindings/python/rust/llm/kv.rs:401-436
Timestamp: 2025-08-29T10:08:18.414Z
Learning: In the Python KvIndexer bindings (lib/bindings/python/rust/llm/kv.rs), the hardcoded reset_states=true parameter passed to start_kv_router_background is intentional behavior, not an oversight that needs to be made configurable.

Applied to files:

lib/llm/src/kv_router.rs

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Build and Test - vllm
GitHub Check: Build and Test - dynamo
GitHub Check: pre-merge-rust (.)
GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: pre-merge-rust (lib/bindings/python)

🔇 Additional comments (8)

lib/llm/src/kv_router/scheduler.rs (4)

58-60: Good: explicit gate for state updates.

Adding update_states on the request clearly separates "query-only" from "mutating" flows.

65-75: Correct fix: make respond idempotent.

Switching to &mut self with resp_tx.take() prevents double-send and preserves ownership.

209-220: Meets PR objective: skip state mutations on query.

Conditionally calling add_request only when update_states is true avoids corrupting slot accounting.

If helpful, I can add a unit/integration test that asserts add_request is not called when update_states=false using a test double for ActiveSequencesMultiWorker.

248-256: Signature change verified—only schedule() call site in lib/llm/src/kv_router.rs was updated and correctly passes update_states.
lib/llm/src/kv_router.rs (4)
271-277: API: explicit update_states parameter is appropriate.

Threading update_states into find_best_match aligns the router with scheduler behavior.

381-386: Correct: do not update states for query-only requests.

!query_instance_id cleanly flips the flag based on the annotation.

297-303: Confirm intent: ApproxKvIndexer still updates on query.

process_routing_decision(...) runs regardless of update_states. If the PR’s “don’t modify states on query” only targets slot accounting, this is fine. If it also aims to avoid indexer feedback on queries, guard this with update_states.

Do you want me to gate this call behind if update_states { ... }?

333-336: Use update_states=false in deprecated generate path
Passing true to find_best_match here allocates slot state that is never freed. Change to false to prevent leaking state in this deprecated AsyncEngine implementation.
Location: lib/llm/src/kv_router.rs:333
Please confirm no downstream consumers rely on state mutation in this path.
⛔ Skipped due to learnings
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#2756
File: lib/bindings/python/rust/llm/kv.rs:401-436
Timestamp: 2025-08-29T10:08:18.414Z
Learning: In the Python KvIndexer bindings (lib/bindings/python/rust/llm/kv.rs), the hardcoded reset_states=true parameter passed to start_kv_router_background is intentional behavior, not an oversight that needs to be made configurable.

Signed-off-by: PeaBrane <yanrpei@gmail.com>

…#2798) Signed-off-by: PeaBrane <yanrpei@gmail.com>

…#2798) Signed-off-by: PeaBrane <yanrpei@gmail.com> Signed-off-by: nnshah1 <neelays@nvidia.com>

PeaBrane added 2 commits August 30, 2025 16:25

first commit

96788de

Signed-off-by: PeaBrane <yanrpei@gmail.com>

forgot to push kv_router.rs

7c0468e

Signed-off-by: PeaBrane <yanrpei@gmail.com>

PeaBrane requested a review from a team as a code owner August 30, 2025 23:27

pull-request-size bot added the size/M label Aug 30, 2025

PeaBrane requested a review from atchernych August 30, 2025 23:27

github-actions bot added the feat label Aug 30, 2025

PeaBrane requested a review from biswapanda August 30, 2025 23:27

PeaBrane self-assigned this Aug 30, 2025

coderabbitai bot reviewed Aug 30, 2025

View reviewed changes

Merge branch 'main' into rupei/no-modify-states

6dcdadc

copy-pr-bot bot temporarily deployed to GITLAB August 31, 2025 00:17 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 31, 2025 00:22 Inactive

Merge branch 'main' into rupei/no-modify-states

826c6e3

copy-pr-bot bot temporarily deployed to GITLAB September 1, 2025 22:30 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 1, 2025 22:32 Inactive

atchernych approved these changes Sep 2, 2025

View reviewed changes

Merge branch 'main' into rupei/no-modify-states

65fb074

copy-pr-bot bot temporarily deployed to GITLAB September 2, 2025 17:51 Inactive

PeaBrane enabled auto-merge (squash) September 2, 2025 17:51

copy-pr-bot bot temporarily deployed to GITLAB September 2, 2025 17:54 Inactive

Merge branch 'main' into rupei/no-modify-states

f7d6493

copy-pr-bot bot temporarily deployed to GITLAB September 2, 2025 20:47 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 2, 2025 20:48 Inactive

expose find_best_worker_id and direct request to router binding

5ccc57e

Signed-off-by: PeaBrane <yanrpei@gmail.com>

pull-request-size bot added size/L and removed size/M labels Sep 2, 2025

copy-pr-bot bot temporarily deployed to GITLAB September 2, 2025 22:21 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 2, 2025 22:22 Inactive

update docs

03c3dfd

Signed-off-by: PeaBrane <yanrpei@gmail.com>

potential load bindings

18075dd

Signed-off-by: PeaBrane <yanrpei@gmail.com>

copy-pr-bot bot temporarily deployed to GITLAB September 2, 2025 22:56 Inactive

PeaBrane changed the title ~~feat: don't modify kv scheduler states on query~~ feat: don't modify kv scheduler states on query + more python binding Sep 2, 2025

copy-pr-bot bot temporarily deployed to GITLAB September 2, 2025 22:57 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 2, 2025 23:02 Inactive

PeaBrane disabled auto-merge September 2, 2025 23:03

copy-pr-bot bot temporarily deployed to GITLAB September 2, 2025 23:05 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 2, 2025 23:07 Inactive

PeaBrane force-pushed the rupei/no-modify-states branch from 894e1d4 to 18075dd Compare September 2, 2025 23:07

cleanup imports

492f94c

Signed-off-by: PeaBrane <yanrpei@gmail.com>

copy-pr-bot bot temporarily deployed to GITLAB September 2, 2025 23:09 Inactive

Merge branch 'main' into rupei/no-modify-states

9920254

copy-pr-bot bot temporarily deployed to GITLAB September 2, 2025 23:10 Inactive

PeaBrane requested a review from alec-flowers September 2, 2025 23:12

copy-pr-bot bot temporarily deployed to GITLAB September 2, 2025 23:13 Inactive

tedzhouhk approved these changes Sep 3, 2025

View reviewed changes

PeaBrane merged commit 383e3b3 into main Sep 3, 2025
16 of 18 checks passed

PeaBrane deleted the rupei/no-modify-states branch September 3, 2025 00:09

This was referenced Sep 3, 2025

[FEATURE]: Give users ability to write custom Routing logic in Python #2824

Closed

[FEATURE]: Router robustness #2889

Closed

dillon-cullinan pushed a commit that referenced this pull request Sep 5, 2025

feat: don't modify kv scheduler states on query + more python binding (…

ae894d0

…#2798) Signed-off-by: PeaBrane <yanrpei@gmail.com>

nnshah1 pushed a commit that referenced this pull request Sep 8, 2025

feat: don't modify kv scheduler states on query + more python binding (…

ad5469c

…#2798) Signed-off-by: PeaBrane <yanrpei@gmail.com> Signed-off-by: nnshah1 <neelays@nvidia.com>

This was referenced Sep 14, 2025

feat: kv commit router #3024

Merged

feat: allow router to not track active blocks (prefill), and to not track cached blocks (decode) #3135

Merged

harryskim mentioned this pull request Sep 30, 2025

[Roadmap]: 0.5.1 and 0.6.0 roadmaps and key dates #3323

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: don't modify kv scheduler states on query + more python binding #2798

feat: don't modify kv scheduler states on query + more python binding #2798

Uh oh!

PeaBrane commented Aug 30, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Aug 30, 2025

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: don't modify kv scheduler states on query + more python binding #2798

feat: don't modify kv scheduler states on query + more python binding #2798

Uh oh!

Conversation

PeaBrane commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 30, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

PeaBrane commented Aug 30, 2025 •

edited

Loading