-
Notifications
You must be signed in to change notification settings - Fork 688
fix: do not delete KV events jetstream #2800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Walkthrough
Changes
Sequence Diagram(s)sequenceDiagram
participant App as Runtime/NATS Transport
participant NATS as NATS JetStream
Note over App: Reset/Connect sequence
App->>NATS: create_stream()
alt create succeeds
NATS-->>App: OK
else create fails
App-->>App: check reset_stream == true
alt reset_stream == true
App->>NATS: get_stream()
NATS-->>App: Stream handle or error
opt on handle
App->>NATS: purge_stream()
NATS-->>App: PurgeResult(count) or Error
Note over App,NATS: Log purge outcome (debug on success, warn on failure)
end
else reset_stream == false
Note over App: Log at debug and continue
end
end
rect rgb(245,245,255)
Note over App,NATS: Consumer setup (unchanged flow with tweak)
App->>NATS: create/configure pull consumer<br/>(inactive_threshold = 3600s)
NATS-->>App: OK or Error
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
lib/llm/src/kv_router/subscriber.rs (1)
181-188: Clarify cleanup guarantees on crash vs. graceful shutdownGood note. Since the consumer now has an inactive_threshold (1h) on the server side, consider referencing that here to make the cleanup story explicit (graceful: delete consumer; ungraceful: rely on inactivity_threshold and/or etcd-driven cleanup). No functional change requested.
Would you confirm that 3600s is acceptable for worst-case replica downtime before auto-cleanup kicks in?
lib/runtime/src/transports/nats.rs (3)
544-547: Reword debug to avoid over-asserting root causeThe create_stream failure isn’t always “already exists.” Minor wording tweak keeps logs accurate while preserving intent.
Apply:
- log::debug!( - "Failed to create NATS stream '{}': {e}. Stream likely already exists, continuing...", - self.stream_name - ); + log::debug!( + "Failed to create NATS stream '{}': {e}. Assuming it may already exist; continuing (if reset_stream=true we'll attempt a purge).", + self.stream_name + );
549-572: Comment nit: say “existing stream,” not “newly created”We purge only when create_stream fails (i.e., an existing stream is present).
Apply:
- // If reset_stream is true, purge all messages from the newly created stream + // If reset_stream is true, purge all messages from the existing stream (create_stream failed)
580-580: Review inactivity threshold SLA and guard destructive stream deletions
- Confirm 1 h
inactive_thresholdis acceptable in production; consider making it configurable viaNATS_CONSUMER_INACTIVE_SECS.- Ensure
delete_streamcalls at lib/runtime/src/transports/nats.rs:1124 and 1307 (which drop entire streams) are limited to tests/dev or gated by configuration to avoid accidental data loss.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
lib/llm/src/kv_router/subscriber.rs(1 hunks)lib/runtime/src/transports/nats.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#2756
File: lib/llm/src/kv_router/subscriber.rs:36-44
Timestamp: 2025-08-29T10:03:48.303Z
Learning: PeaBrane prefers to keep PRs contained in scope and is willing to defer technical improvements to future PRs when the current implementation works for the immediate use case. They acknowledge technical debt but prioritize deliverability over completeness in individual PRs.
📚 Learning: 2025-08-27T17:56:14.648Z
Learnt from: kthui
PR: ai-dynamo/dynamo#2500
File: lib/llm/src/migration.rs:58-77
Timestamp: 2025-08-27T17:56:14.648Z
Learning: In lib/llm/src/migration.rs, the cancellation visibility in the Migration operator is intentionally one-way - it checks engine_ctx.is_stopped()/is_killed() to stop pulling from streams but doesn't link newly created streams as child contexts to the parent. This is a conscious architectural decision with plans for future enhancement.
Applied to files:
lib/llm/src/kv_router/subscriber.rs
📚 Learning: 2025-05-29T00:02:35.018Z
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Applied to files:
lib/llm/src/kv_router/subscriber.rs
📚 Learning: 2025-06-02T19:37:27.666Z
Learnt from: oandreeva-nv
PR: ai-dynamo/dynamo#1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets `None`. The `start_batching_publisher` function in `lib/llm/tests/block_manager.rs` demonstrates this pattern: when the `KVBMDynamoRuntimeComponent` is dropped, its `batch_tx` sender is dropped, causing `rx.recv()` to return `None`, which triggers cleanup and task termination.
Applied to files:
lib/llm/src/kv_router/subscriber.rs
📚 Learning: 2025-06-13T22:07:24.843Z
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
Applied to files:
lib/runtime/src/transports/nats.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build and Test - vllm
- GitHub Check: Build and Test - dynamo
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (.)
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
5752b34 to
2a33d1f
Compare
This reverts commit 524ff4c. Signed-off-by: PeaBrane <yanrpei@gmail.com>
2a33d1f to
2890fb4
Compare
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com> Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
Overview:
tokio::spawninstead of the secondary runtime to prevent CI flakiness. These tasks are also fairly critical.The only substantive change to review is
nats.rsSummary by CodeRabbit
New Features
Chores