Skip to content

Conversation

@PeaBrane
Copy link
Contributor

@PeaBrane PeaBrane commented Aug 28, 2025

Motivation

We need a way for the KV events/block states to persist when the Router goes down, so the Router (replica) can warm restart/revive. Closes #1056

Overview:

Core changes are mostly contained in kv_router/subscriber.rs.

  1. KV events are now published over NATs jetstream for persistence. So each worker is a publisher, and each router/indexer is a durable consumer, all tied to the same stream.
  2. To prevent unbounded growth of the stream, periodically the stream is purged, and the radix tree state is saved to NATs object store, which can be loaded by new Router replicas.

All other changes are boilerplates and tests (unlikely serious reviewing efforts needed).

Small algorithmic notes

Kv events act idempotently on the Indexer, so it is fine for the snapshotted state to be ahead of the purged watermark. Simply put, replaying an event twice to the indexer is fine. This is what happens in our case, where we conservatively choose the purged watermark to be the min ack'ed sequence over all Router replicas.

To prevent multiple replicas from racing to purge, we put a lock key atomically into etcd while purging. The lock is tied to the etcd lease (replica lifetime), in case the process crashes while purging. Importantly, we save the radix tree state before purging the stream, to guarantee that the snapshotted state is always ahead of the stream.

Summary by CodeRabbit

  • New Features
    • Added router CLI options: --snapshot-threshold to enable periodic snapshots, and --reset-states/--persist-states to control startup state.
    • Routers now support background NATS-based event processing with durable consumer IDs and optional snapshotting for faster recovery.
    • Exposed APIs to dump router/indexer events for state inspection.
  • Documentation
    • Updated KV routing architecture with new flags, startup/persistence behavior, replica sync, and event persistence/recovery guidance.
  • Tests
    • Added end-to-end test ensuring router replicas converge on identical state across restarts.

Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
… tests)

Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
@PeaBrane PeaBrane force-pushed the rupei/router-warm-restarts branch from b893aa7 to d6386f1 Compare August 28, 2025 20:44
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
@PeaBrane PeaBrane merged commit 488c870 into main Aug 30, 2025
16 of 17 checks passed
@PeaBrane PeaBrane deleted the rupei/router-warm-restarts branch August 30, 2025 23:42
KrishnanPrash pushed a commit that referenced this pull request Sep 2, 2025
…napshotting (#2756)

Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
dillon-cullinan pushed a commit that referenced this pull request Sep 5, 2025
…napshotting (#2756)

Signed-off-by: PeaBrane <yanrpei@gmail.com>
nnshah1 pushed a commit that referenced this pull request Sep 8, 2025
…napshotting (#2756)

Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow warm restarts of Router replicas via RadixTree snapshots

5 participants