-
Notifications
You must be signed in to change notification settings - Fork 708
docs: mermaid diagrams showcasing various KV router features #3184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: PeaBrane <yanrpei@gmail.com>
WalkthroughRewrites the KV cache routing architecture doc to describe two layers: global persistent KV state via NATS JetStream and local per-router active block management with replica sync. Adds multiple Mermaid diagrams, clarifies flows, separates persistent prefix blocks from ephemeral active blocks, and removes an old ASCII diagram. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Engine
participant NATS as NATS JetStream
participant Obj as NATS Object Store
participant RouterA as Router Replica A
participant RouterB as Router Replica B
rect rgb(235, 245, 255)
note over Engine,NATS: Global KV event publishing and consumption
Engine->>NATS: Publish KV block events
NATS-->>RouterA: Deliver to durable consumer
NATS-->>RouterB: Deliver to durable consumer
RouterA->>Obj: Periodic snapshot (prefix blocks)
RouterB->>Obj: Periodic snapshot (prefix blocks)
end
sequenceDiagram
autonumber
participant Client
participant RouterA as Router Replica A
participant RouterB as Router Replica B
participant Core as NATS Core Messaging
rect rgb(240, 255, 240)
note over Client,RouterA: Local active block management timeline
Client->>RouterA: Request received
RouterA->>RouterA: Predict active blocks (t0)
RouterA-->>Core: Broadcast active block update
RouterB-->>Core: Broadcast own active blocks
Core-->>RouterA: Replica updates (sync)
RouterA->>RouterA: Adjust on first token (t1)
RouterA->>RouterA: Finalize on completion (t2)
end
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
docs/architecture/kv_cache_routing.md (3)
62-64: Naming consistency: KVIndexer → KvIndexer (matches code/docs elsewhere).Use “KvIndexer” casing in the diagram labels to align with references in flags and code.
Apply this diff inside the mermaid block:
- R1[Router 1<br/>KVIndexer] - R2[Router 2<br/>KVIndexer] + R1[Router 1<br/>KvIndexer] + R2[Router 2<br/>KvIndexer]
90-90: Capitalize NATS Core (proper noun).Minor wording tweak for product name clarity.
-This is managed locally in each router via a "slot manager". To maintain consistency across the system, router replicas synchronize these local predictions with each other through NATS core messaging. +This is managed locally in each router via a "slot manager". To maintain consistency across the system, router replicas synchronize these local predictions with each other through NATS Core messaging.
92-129: Clarify broadcast semantics in replica sync sequence.The arrows suggest point-to-point; in practice routers publish to a shared subject and all replicas receive. Add a brief note to prevent misinterpretation.
Apply this small tweak in the sequence diagram:
- Note over R1,R2: Router Replica Sync Enabled + Note over R1,R2: Router Replica Sync Enabled (pub-sub on shared subject; all replicas receive)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/architecture/kv_cache_routing.md(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#2756
File: lib/llm/src/kv_router/subscriber.rs:36-44
Timestamp: 2025-08-29T10:03:48.330Z
Learning: PeaBrane prefers to keep PRs contained in scope and is willing to defer technical improvements to future PRs when the current implementation works for the immediate use case. They acknowledge technical debt but prioritize deliverability over completeness in individual PRs.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#3095
File: lib/llm/src/kv_router/subscriber.rs:200-223
Timestamp: 2025-09-17T20:55:41.392Z
Learning: In the dynamo codebase, PeaBrane prefers to maintain consistency with existing etcd key parsing patterns (like splitting on '/' and parsing the last segment) rather than introducing more robust parsing approaches, even when the current approach might be brittle, to keep the codebase aligned and avoid divergent patterns.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#3095
File: lib/llm/src/kv_router/indexer.rs:0-0
Timestamp: 2025-09-17T20:55:06.313Z
Learning: When PeaBrane encounters a complex implementation issue that would significantly expand PR scope (like the remove_worker_sender method in lib/llm/src/kv_router/indexer.rs that required thread-safe map updates and proper shard targeting), they prefer to remove the problematic implementation entirely rather than rush a partial fix, deferring the proper solution to a future PR.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and Test - trtllm
- GitHub Check: Build and Test - vllm
- GitHub Check: Build and Test - sglang
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
docs/architecture/kv_cache_routing.md (1)
37-44: Nice addition: clear two-layer overview and context.The new Overview and framing are clear and helpful. Please confirm the docs toolchain renders Mermaid fences (```mermaid) in this path so diagrams show up on the site.
Signed-off-by: PeaBrane <yanrpei@gmail.com> Signed-off-by: Jason Zhou <jasonzho@nvidia.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com> Signed-off-by: Jason Zhou <jasonzho@nvidia.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com> Signed-off-by: Kyle H <kylhuang@nvidia.com>
Overview:
Added mermaid diagrams to
kv_cache_routing.mdto showcase persistent KV events, radix snapshotting, and router replica syncing.Summary by CodeRabbit