Skip to content

Conversation

@PeaBrane
Copy link
Contributor

@PeaBrane PeaBrane commented Aug 27, 2025

Add broadcast pattern and purge support to NatsQueue

Summary

Extends NatsQueue to support multiple consumers independently receiving all messages (broadcast pattern) and adds stream purging to prevent unbounded storage growth.

Motivation

  • Multiple router replicas need to receive all KV events independently for redundancy
  • Old KV events must be purged after radix tree snapshots to prevent unbounded storage

Changes

1. Configurable Consumer Names

  • Added optional consumer_name field to NatsQueue
  • Added new_with_consumer() constructor for creating queues with unique consumer names
  • Each consumer with a unique name receives all messages (broadcast instead of work-queue)

2. Stream Purging

  • Added purge_up_to_sequence() to permanently remove messages up to a specified sequence
  • Enables cleanup of old KV events after radix tree snapshots

Testing

Added test_nats_queue_broadcast_with_purge integration test verifying:

  • Multiple consumers with unique names receive all messages
  • Purging removes messages for all consumers
  • Only messages after purge sequence are delivered

Backward Compatibility

Existing code using NatsQueue::new() continues unchanged with default "worker-group" consumer.

Summary by CodeRabbit

  • New Features
    • Per-consumer broadcast support: run multiple named consumers that each receive the full message stream independently.
    • Ability to purge messages up to a specified sequence for easier stream maintenance.
  • Bug Fixes
    • Queue size now correctly reflects the selected consumer rather than a fixed default.
  • Tests
    • Added integration tests covering per-consumer broadcasts and purge behavior to ensure correct delivery and cleanup across multiple consumers.

@PeaBrane PeaBrane self-assigned this Aug 27, 2025
@PeaBrane PeaBrane marked this pull request as ready for review August 27, 2025 02:14
@PeaBrane PeaBrane requested a review from a team as a code owner August 27, 2025 02:14
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 27, 2025

Walkthrough

Implements per-consumer delivery in NATS by adding an optional consumer name to NatsQueue, updates durable consumer creation and queue size lookup to respect it, adds a purge_up_to_sequence API affecting the stream, and introduces integration tests validating broadcast semantics and purge behavior.

Changes

Cohort / File(s) Summary
NATS transport implementation
lib/runtime/src/transports/nats.rs
- Added field consumer_name: Option<String> to NatsQueue.
- Added new_with_consumer(stream_name, nats_server, dequeue_timeout, consumer_name); new() now sets consumer_name to None.
- Durable consumer name now uses self.consumer_name.unwrap_or("worker-group").
- get_queue_size() queries the derived consumer name instead of a fixed value.
- Added purge_up_to_sequence(sequence) to purge stream messages up to a sequence.
- Added integration test scaffolding for two consumers with distinct names, publishing, purging, verifying receipt, and stream cleanup.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor App as App
    participant Q_A as NatsQueue(A)<br/><i>consumer=A</i>
    participant Q_B as NatsQueue(B)<br/><i>consumer=B</i>
    participant JS as NATS JetStream

    rect rgb(235, 245, 255)
    note over Q_A,JS: Setup
    App->>Q_A: new_with_consumer("stream", "nats", timeout, "A")
    App->>Q_B: new_with_consumer("stream", "nats", timeout, "B")
    Q_A->>JS: Ensure pull consumer durable="A"
    Q_B->>JS: Ensure pull consumer durable="B"
    end

    rect rgb(240, 255, 240)
    note over App,JS: Publish
    App->>JS: Publish messages M1..Mn to stream
    end

    rect rgb(255, 250, 235)
    note over App,JS: Purge up to sequence S
    App->>Q_A: purge_up_to_sequence(S)
    Q_A->>JS: Purge stream messages < S
    JS-->>Q_A: Purge result
    end

    rect rgb(245, 245, 255)
    note over Q_A,Q_B: Consume post-purge
    Q_A->>JS: Pull (durable="A")
    JS-->>Q_A: Deliver messages >= S
    Q_B->>JS: Pull (durable="B")
    JS-->>Q_B: Deliver messages >= S
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I tuned my ears to NATS’s breeze,
Two queues, two buns, on separate trees.
We nibble streams with names our own,
Old leaves purged, fresh carrots shown.
Thump-thump! Broadcasts hop in line—
Each rabbit gets a crunchy sign.
Queue-cheers! The garden’s fine. 🥕🐇

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (8)
lib/runtime/src/transports/nats.rs (8)

446-448: Promote default consumer name to a single constant; avoid magic string duplication.

"worker-group" appears in multiple places. Define a DEFAULT_CONSUMER_NAME and reuse to prevent drift.

Apply within changed ranges:

-                .unwrap_or_else(|| "worker-group".to_string()),
+                .unwrap_or_else(|| DEFAULT_CONSUMER_NAME.to_string()),
-            let consumer_name = self
-                .consumer_name
-                .clone()
-                .unwrap_or_else(|| "worker-group".to_string());
+            let consumer_name = self
+                .consumer_name
+                .clone()
+                .unwrap_or_else(|| DEFAULT_CONSUMER_NAME.to_string());

Add this constant near other top-level consts:

const DEFAULT_CONSUMER_NAME: &str = "worker-group";

469-488: Sanitize/validate consumer_name to avoid invalid durable names.

Mirror the stream_name sanitization to prevent slashes/backslashes (and potential server-side rejections).

 pub fn new_with_consumer(
     stream_name: String,
     nats_server: String,
     dequeue_timeout: time::Duration,
     consumer_name: String,
 ) -> Self {
     let sanitized_stream_name = stream_name.replace(['/', '\\'], "_");
     let subject = format!("{}.*", sanitized_stream_name);
+    let sanitized_consumer_name = consumer_name.replace(['/', '\\'], "_");

     Self {
         stream_name: sanitized_stream_name,
         nats_server,
         dequeue_timeout,
         client: None,
         subject,
         subscriber: None,
-        consumer_name: Some(consumer_name),
+        consumer_name: Some(sanitized_consumer_name),
     }
 }

514-518: Durable name fallback is good; consider pinning key consumer policies explicitly.

Relying on crate defaults (deliver/ack policy) can be brittle. Explicitly set deliver=All and ack=Explicit in the pull consumer config to lock semantics.

If you want to verify exact field names for async-nats pull consumer config in your current version, I can look it up and propose a concrete diff. Do you want me to fetch the latest docs?


596-602: Per-consumer queue size makes sense; add minor resilience.

If the consumer was deleted out-of-band, get_consumer will error. Optionally, return 0 in that case or recreate the consumer to keep this API robust.


612-633: Guard no-op purges and fix doc link; purge builder reference points to sync crate.

  • Add an early return for sequence <= 1 (no messages before the first).
  • The comment links to nats (sync) docs; switch to async-nats or drop the link.

Apply within changed range:

 pub async fn purge_up_to_sequence(&self, sequence: u64) -> Result<()> {
     if let Some(client) = &self.client {
         let stream = client.jetstream().get_stream(&self.stream_name).await?;

+        if sequence <= 1 {
+            log::debug!(
+                "purge_up_to_sequence: sequence <= 1; nothing to purge for stream {}",
+                self.stream_name
+            );
+            return Ok(());
+        }
 
-        // NOTE: this purge excludes the sequence itself
-        // https://docs.rs/nats/latest/nats/jetstream/struct.PurgeRequest.html
+        // NOTE: this purge excludes the sequence itself (see async-nats purge builder docs)
         stream.purge().sequence(sequence).await.map_err(|e| {
             anyhow::anyhow!("Failed to purge stream up to sequence {}: {}", sequence, e)
         })?;

846-965: Stabilize the integration test to avoid flakiness with zero fetch expiry.

Use a small positive timeout and pass it to dequeue_task to avoid prematurely breaking the drain loop when messages are pending but not returned immediately.

-        let timeout = time::Duration::from_secs(0);
+        let timeout = time::Duration::from_millis(50);
-        while let Some(msg) = queue1
-            .dequeue_task(None)
+        while let Some(msg) = queue1
+            .dequeue_task(Some(timeout))
             .await
             .expect("Failed to dequeue from queue1")
         {
             consumer1_messages.push(msg);
         }
-        while let Some(msg) = queue2
-            .dequeue_task(None)
+        while let Some(msg) = queue2
+            .dequeue_task(Some(timeout))
             .await
             .expect("Failed to dequeue from queue2")
         {
             consumer2_messages.push(msg);
         }

897-915: Clarify misleading comment; no stream info used for the purge decision.

The code purges using a fixed sequence (3) and doesn’t read stream info here. Update the comment to reflect the actual behavior.

-        // Get stream info to find the sequence numbers
-        // We need to know the sequence of message 2 to purge up to it
+        // Purge the first two messages by passing sequence=3 (purge is exclusive)

959-965: Optional cleanup: close queues before deleting the stream.

Closing avoids dangling consumers on teardown (especially useful when un-ignoring this test).

-        // Clean up by deleting the stream
+        // Clean up by closing consumers and deleting the stream
+        queue1.close().await.ok();
+        queue2.close().await.ok();
         client
             .jetstream()
             .delete_stream(&stream_name)
             .await
             .expect("Failed to delete test stream");
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6f8ce17 and d153d94.

📒 Files selected for processing (1)
  • lib/runtime/src/transports/nats.rs (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
lib/runtime/src/transports/nats.rs (2)
lib/bindings/python/src/dynamo/_core.pyi (2)
  • NatsQueue (876-939)
  • new (107-125)
lib/bindings/python/rust/llm/nats.rs (1)
  • new (27-34)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
🔇 Additional comments (1)
lib/runtime/src/transports/nats.rs (1)

465-467: LGTM: Backward-compatible default preserved.

Defaulting to None keeps existing worker-queue behavior intact.

@PeaBrane PeaBrane merged commit b9640e5 into main Aug 27, 2025
13 checks passed
@PeaBrane PeaBrane deleted the rupei/nats-purge branch August 27, 2025 04:18
krishung5 pushed a commit that referenced this pull request Aug 27, 2025
…old messages (#2740)

Signed-off-by: krishung5 <krish@nvidia.com>
jasonqinzhou pushed a commit that referenced this pull request Aug 30, 2025
…old messages (#2740)

Signed-off-by: Jason Zhou <jasonzho@jasonzho-mlt.client.nvidia.com>
KrishnanPrash pushed a commit that referenced this pull request Sep 2, 2025
…old messages (#2740)

Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
nnshah1 pushed a commit that referenced this pull request Sep 8, 2025
…old messages (#2740)

Signed-off-by: nnshah1 <neelays@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants