Skip to content

Conversation

@PeaBrane
Copy link
Contributor

@PeaBrane PeaBrane commented Sep 23, 2025

Overview:

Added mermaid diagrams to kv_cache_routing.md to showcase persistent KV events, radix snapshotting, and router replica syncing.

Summary by CodeRabbit

  • Documentation
    • Rewrote the architecture overview to introduce two pillars: global persistent cache state and local active block management.
    • Added diagrams explaining event flow and replica synchronization.
    • Expanded guidance on routing decisions, differentiating persistent prefix blocks from ephemeral active blocks.
    • Clarified options for enabling/disabling replica sync and described persistence and recovery behaviors.
    • Removed outdated ASCII diagram and consolidated explanations for easier navigation.
    • Improved narrative on how global and per-replica states interact to influence routing.

Signed-off-by: PeaBrane <yanrpei@gmail.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 23, 2025

Walkthrough

Rewrites the KV cache routing architecture doc to describe two layers: global persistent KV state via NATS JetStream and local per-router active block management with replica sync. Adds multiple Mermaid diagrams, clarifies flows, separates persistent prefix blocks from ephemeral active blocks, and removes an old ASCII diagram.

Changes

Cohort / File(s) Summary of Changes
Docs: KV cache routing architecture
docs/architecture/kv_cache_routing.md
Replaced Architecture section with Overview; added detailed descriptions of global KV state (JetStream, Object Store, durable consumers) and local slot management with replica sync; added Mermaid diagrams; removed legacy ASCII diagram; clarified persistent vs. ephemeral blocks and replica sync/persistence behavior.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Engine
  participant NATS as NATS JetStream
  participant Obj as NATS Object Store
  participant RouterA as Router Replica A
  participant RouterB as Router Replica B

  rect rgb(235, 245, 255)
    note over Engine,NATS: Global KV event publishing and consumption
    Engine->>NATS: Publish KV block events
    NATS-->>RouterA: Deliver to durable consumer
    NATS-->>RouterB: Deliver to durable consumer
    RouterA->>Obj: Periodic snapshot (prefix blocks)
    RouterB->>Obj: Periodic snapshot (prefix blocks)
  end
Loading
sequenceDiagram
  autonumber
  participant Client
  participant RouterA as Router Replica A
  participant RouterB as Router Replica B
  participant Core as NATS Core Messaging

  rect rgb(240, 255, 240)
    note over Client,RouterA: Local active block management timeline
    Client->>RouterA: Request received
    RouterA->>RouterA: Predict active blocks (t0)
    RouterA-->>Core: Broadcast active block update
    RouterB-->>Core: Broadcast own active blocks
    Core-->>RouterA: Replica updates (sync)
    RouterA->>RouterA: Adjust on first token (t1)
    RouterA->>RouterA: Finalize on completion (t2)
  end
Loading

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

I thump my paws on clustered ground,
Two layers hum—what tidy sound!
JetStream clouds keep blocks in line,
While local slots sync just in time.
Snapshots, tokens—hop, don’t lag—
A rabbit routes without a snag. 🐇✨

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description contains only a brief Overview sentence and omits the required template sections 'Details', 'Where should the reviewer start?', and 'Related Issues', so it does not follow the repository's PR template and lacks actionable reviewer guidance. Please expand the PR description to follow the template by adding a 'Details' section summarizing the specific edits (e.g., files changed and what each diagram illustrates), a 'Where should the reviewer start?' section that points to docs/architecture/kv_cache_routing.md, and a 'Related Issues' entry or explicit "none"; include any verification steps or screenshots if applicable.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title succinctly and accurately describes the primary change: adding Mermaid diagrams that illustrate KV router features; it directly maps to modifications in docs/architecture/kv_cache_routing.md and is specific and readable for history scanning.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
docs/architecture/kv_cache_routing.md (3)

62-64: Naming consistency: KVIndexer → KvIndexer (matches code/docs elsewhere).

Use “KvIndexer” casing in the diagram labels to align with references in flags and code.

Apply this diff inside the mermaid block:

-        R1[Router 1<br/>KVIndexer]
-        R2[Router 2<br/>KVIndexer]
+        R1[Router 1<br/>KvIndexer]
+        R2[Router 2<br/>KvIndexer]

90-90: Capitalize NATS Core (proper noun).

Minor wording tweak for product name clarity.

-This is managed locally in each router via a "slot manager". To maintain consistency across the system, router replicas synchronize these local predictions with each other through NATS core messaging.
+This is managed locally in each router via a "slot manager". To maintain consistency across the system, router replicas synchronize these local predictions with each other through NATS Core messaging.

92-129: Clarify broadcast semantics in replica sync sequence.

The arrows suggest point-to-point; in practice routers publish to a shared subject and all replicas receive. Add a brief note to prevent misinterpretation.

Apply this small tweak in the sequence diagram:

-    Note over R1,R2: Router Replica Sync Enabled
+    Note over R1,R2: Router Replica Sync Enabled (pub-sub on shared subject; all replicas receive)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 13a5d61 and d7f2174.

📒 Files selected for processing (1)
  • docs/architecture/kv_cache_routing.md (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#2756
File: lib/llm/src/kv_router/subscriber.rs:36-44
Timestamp: 2025-08-29T10:03:48.330Z
Learning: PeaBrane prefers to keep PRs contained in scope and is willing to defer technical improvements to future PRs when the current implementation works for the immediate use case. They acknowledge technical debt but prioritize deliverability over completeness in individual PRs.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#3095
File: lib/llm/src/kv_router/subscriber.rs:200-223
Timestamp: 2025-09-17T20:55:41.392Z
Learning: In the dynamo codebase, PeaBrane prefers to maintain consistency with existing etcd key parsing patterns (like splitting on '/' and parsing the last segment) rather than introducing more robust parsing approaches, even when the current approach might be brittle, to keep the codebase aligned and avoid divergent patterns.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#3095
File: lib/llm/src/kv_router/indexer.rs:0-0
Timestamp: 2025-09-17T20:55:06.313Z
Learning: When PeaBrane encounters a complex implementation issue that would significantly expand PR scope (like the remove_worker_sender method in lib/llm/src/kv_router/indexer.rs that required thread-safe map updates and proper shard targeting), they prefer to remove the problematic implementation entirely rather than rush a partial fix, deferring the proper solution to a future PR.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build and Test - trtllm
  • GitHub Check: Build and Test - vllm
  • GitHub Check: Build and Test - sglang
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
docs/architecture/kv_cache_routing.md (1)

37-44: Nice addition: clear two-layer overview and context.

The new Overview and framing are clear and helpful. Please confirm the docs toolchain renders Mermaid fences (```mermaid) in this path so diagrams show up on the site.

@PeaBrane PeaBrane merged commit d54f6fe into main Sep 24, 2025
16 checks passed
@PeaBrane PeaBrane deleted the rupei/router-mermaids branch September 24, 2025 18:55
jasonqinzhou pushed a commit that referenced this pull request Sep 24, 2025
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: Jason Zhou <jasonzho@nvidia.com>
jasonqinzhou pushed a commit that referenced this pull request Sep 24, 2025
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: Jason Zhou <jasonzho@nvidia.com>
kylehh pushed a commit that referenced this pull request Sep 25, 2025
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: Kyle H <kylhuang@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants