Skip to content

Conversation

@PeaBrane
Copy link
Contributor

@PeaBrane PeaBrane commented Dec 1, 2025

Overview:

.. so it is safe to clone MetricsRegistry, hence Endpoint, hence Client

Summary by CodeRabbit

  • Refactor
    • Enhanced metrics registry infrastructure to support safe, shared state management across clones with improved callback handling.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: PeaBrane <yanrpei@gmail.com>
@PeaBrane PeaBrane requested a review from a team as a code owner December 1, 2025 23:39
@PeaBrane PeaBrane requested a review from keivenchang December 1, 2025 23:39
@github-actions github-actions bot added the chore label Dec 1, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 1, 2025

Walkthrough

The MetricsRegistry struct has been refactored to use Arc-wrapped fields for shared state management. The prometheus_registry is now wrapped in Arc<RwLock<...>>, two new Arc-wrapped callback vector fields have been added, and the struct now derives Clone instead of using a manual implementation.

Changes

Cohort / File(s) Summary
Arc-wrapped shared state and callback registries
lib/runtime/src/metrics.rs
Modified prometheus_registry field from RwLock<prometheus::Registry> to Arc<RwLock<prometheus::Registry>>; added two new Arc-wrapped callback registries (prometheus_update_callbacks and prometheus_expfmt_callbacks); added #[derive(Clone)] to MetricsRegistry; removed manual Clone impl; updated MetricsRegistry::new() initialization.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Verify all usages of prometheus_registry are compatible with Arc wrapping and that lock acquisition patterns remain correct
  • Confirm the new callback registry fields are properly initialized and used consistently throughout the codebase
  • Check for any potential deadlock risks introduced by nested Arc<RwLock<...>> patterns

Poem

🐰 A registry shared, wrapped in Arc so fine,
Clones dance freely, state now intertwine,
Callbacks collected in lockable towers tall,
Through RwLock's gates, all threads can call,
Shared it thrives—no clone left behind! ✨

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description is incomplete; it only contains the Overview section while missing Details, Where should the reviewer start, and Related Issues sections from the template. Add Details section explaining the changes, specify files for review focus, and add a Related Issues section with appropriate issue references.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: wrapping prometheus_registry in Arc within MetricsRegistry.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5708b70 and 1a57083.

📒 Files selected for processing (1)
  • lib/runtime/src/metrics.rs (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 3184
File: docs/architecture/kv_cache_routing.md:70-73
Timestamp: 2025-09-23T20:08:37.105Z
Learning: PeaBrane prefers to keep documentation diagrams simplified to avoid visual overload, even when this means sacrificing some technical precision for the sake of clarity and comprehension. They prioritize pedagogical effectiveness over exhaustive technical detail in architectural diagrams.
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 2756
File: lib/llm/src/kv_router/subscriber.rs:36-44
Timestamp: 2025-08-29T10:03:48.330Z
Learning: PeaBrane prefers to keep PRs contained in scope and is willing to defer technical improvements to future PRs when the current implementation works for the immediate use case. They acknowledge technical debt but prioritize deliverability over completeness in individual PRs.
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 3095
File: lib/llm/src/kv_router/subscriber.rs:200-223
Timestamp: 2025-09-17T20:55:41.416Z
Learning: In the dynamo codebase, PeaBrane prefers to maintain consistency with existing etcd key parsing patterns (like splitting on '/' and parsing the last segment) rather than introducing more robust parsing approaches, even when the current approach might be brittle, to keep the codebase aligned and avoid divergent patterns.
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 3095
File: lib/llm/src/kv_router/indexer.rs:0-0
Timestamp: 2025-09-17T20:55:06.333Z
Learning: When PeaBrane encounters a complex implementation issue that would significantly expand PR scope (like the remove_worker_sender method in lib/llm/src/kv_router/indexer.rs that required thread-safe map updates and proper shard targeting), they prefer to remove the problematic implementation entirely rather than rush a partial fix, deferring the proper solution to a future PR.
🧬 Code graph analysis (1)
lib/runtime/src/metrics.rs (1)
lib/runtime/src/distributed.rs (1)
  • new (98-293)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: sglang (amd64)
  • GitHub Check: sglang (arm64)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: operator (amd64)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: clippy (launch/dynamo-run)
  • GitHub Check: tests (.)
  • GitHub Check: tests (lib/runtime/examples)
  • GitHub Check: clippy (.)
  • GitHub Check: clippy (lib/bindings/python)
  • GitHub Check: tests (lib/bindings/python)
🔇 Additional comments (2)
lib/runtime/src/metrics.rs (2)

649-667: LGTM! Arc-wrapping enables safe shared state across clones.

The refactoring correctly wraps all fields in Arc, allowing #[derive(Clone)] to produce cheap, state-sharing clones. The documentation clearly explains the rationale. This is the right approach for enabling Endpoint and Client cloning with shared metrics visibility.

Note: This is a semantic change where clones now share state rather than being independent. The existing tests at lines 990-998 validate this new behavior works correctly.


691-699: LGTM! Initialization correctly matches the Arc-wrapped field types.

The new() method properly initializes all fields with Arc::new(...), aligning with the struct definition changes.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@PeaBrane
Copy link
Contributor Author

PeaBrane commented Dec 3, 2025

@keivenchang can you take another look at this? All the integration tests pass now (locally) after merging in your changes from main

Copy link
Contributor

@keivenchang keivenchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the integration tests are passing?

@PeaBrane
Copy link
Contributor Author

PeaBrane commented Dec 3, 2025

All the integration tests are passing?

seems like it at least locally; there were some rust errors I need to fix to get the integration tests in lib/runtime going (needed on both main and this PR), but everything in lib/llm was green either way

@PeaBrane PeaBrane merged commit c5e8c4c into main Dec 3, 2025
39 of 40 checks passed
@PeaBrane PeaBrane deleted the rupei/arc-prometheus-registry branch December 3, 2025 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants