Skip to content

Conversation

@ziqifan617
Copy link
Contributor

@ziqifan617 ziqifan617 commented Aug 21, 2025

Overview:

Set up the infra to enable metrics in KVBM using dynamo metrics mechanism.

In the future PRs, I will add more concrete KVBM metrics. This PR only contains two metrics for demo purpose.

Screenshot 2025-08-21 at 2 55 21 PM

Details:

Note that there is a WAR since we have more than one DistritbutedRuntimes, one for KVBM leader and one or more for each KVBM worker. To avoid metrics port conflict, we would search for a new port for each DRT. In the future, @keivenchang might include this as a default feature for metrics registry

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Expose KVBM metrics (offload and save_kv_layer requests) via Prometheus.
    • Auto-select metrics port to avoid conflicts when Detached runtime is used.
    • Add Grafana dashboard with leader/worker panels.
    • Enable Prometheus scraping for leader (9999) and worker (9998).
  • Documentation

    • New guide section on enabling and viewing KVBM metrics with Prometheus/Grafana.
  • Chores

    • Devcontainer: enable all Cargo features for Rust analysis; fix JSON commas.
    • Python bindings: add Prometheus dependency.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 21, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 21, 2025

Caution

Review failed

The head commit changed during the review from 9b0d9dd to e13000b.

Walkthrough

Adds KVBM metrics: new KvbmMetrics struct and integration across vLLM leader/worker paths; propagates metrics through offload/save flows. Introduces Prometheus and Grafana configs, plus documentation to enable/view metrics. Adds a port-selection helper and uses it in leader/worker Python initializers. Minor devcontainer setting updates.

Changes

Cohort / File(s) Summary
Devcontainer config
.devcontainer/devcontainer.json
Enable rust-analyzer allFeatures; fix JSON commas; tidy containerEnv commas.
Metrics struct and crate deps
lib/llm/src/block_manager.rs, lib/llm/src/block_manager/metrics_kvbm.rs, lib/bindings/python/Cargo.toml
Add public KvbmMetrics (IntCounter fields, constructor). Expose module. Add prometheus crate dependency for Python bindings crate.
Leader-side Rust wiring
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs, .../leader/recorder.rs, .../leader/slot.rs
Create KvbmMetrics via drt.namespace("kvbm_connector_leader"); pass into ConnectorSlotManager and LocalTransferEngine; increment offload_requests in process_offload_request; add iteration_counter field; update constructor signatures.
Worker-side Rust wiring
lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs
Initialize KvbmMetrics via drt.namespace("kvbm_connector_worker"); store in struct; increment save_kv_layer_requests on save path.
Python runtime port handling
lib/bindings/python/src/dynamo/llm/vllm_integration/connector_leader.py, .../connector_worker.py, .../kv_cache_utils.py
Add find_and_set_available_port_from_env; call it when drt is None to avoid DYN_SYSTEM_PORT conflicts; utility searches/binds available localhost port and sets env.
Deploy: Prometheus and Grafana
deploy/metrics/prometheus.yml, deploy/metrics/grafana_dashboards/grafana-kvbm-dashboard.json
Add scrape jobs for kvbm leader/worker (ports 9999/9998); add Grafana dashboard with two panels for offload and save_kv_layer request counters.
Docs
docs/guides/run_kvbm_in_vllm.md
Add section with steps to enable metrics and view KVBM dashboard in Grafana (ports, env vars, login).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Leader as KvConnectorLeader
  participant Worker as KvConnectorWorker
  participant DRT as DistributedRuntime<br/>(MetricsRegistry ns)
  participant Prom as Prometheus
  participant Graf as Grafana

  rect rgb(240,248,255)
  note over Leader,Worker: Metrics initialization
  Leader->>DRT: namespace("kvbm_connector_leader")
  Worker->>DRT: namespace("kvbm_connector_worker")
  DRT-->>Leader: MetricsRegistry (KvbmMetrics)
  DRT-->>Worker: MetricsRegistry (KvbmMetrics)
  end

  rect rgb(245,255,240)
  note over Worker: Save KV layer
  Worker->>Worker: kvbm_metrics.save_kv_layer_requests.inc()
  end

  rect rgb(255,250,240)
  note over Leader: Offload request
  Leader->>Leader: kvbm_metrics.offload_requests.inc()
  end

  Prom-->>Leader: scrape /metrics (9999)
  Prom-->>Worker: scrape /metrics (9998)
  Graf-->>Prom: query counters for dashboard
Loading
sequenceDiagram
  autonumber
  participant Client as vLLM
  participant Leader as ConnectorSlotManager/LocalTransferEngine
  participant BM as VllmBlockManager
  participant Metrics as KvbmMetrics

  Client->>Leader: enqueue offload request
  Leader->>Leader: process_offload_request(...)
  Leader->>Metrics: offload_requests.inc()
  Leader->>BM: perform offload ops
  BM-->>Leader: result
  Leader-->>Client: completion
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

A rabbit counts the keys and bytes,
Ticks for saves, for offload flights.
Prometheus hears each gentle beat,
Grafana paints the metric sheet.
Ports aligned, the burrow hums—
KVBM sings as data comes. 🐇📈


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
lib/bindings/python/src/dynamo/llm/vllm_integration/kv_cache_utils.py (1)

93-116: Simplify find_and_set_available_port_from_env by delegating ephemeral port selection to DRT

The runtime already treats DYN_SYSTEM_PORT=0 as “choose an ephemeral port” (see RuntimeConfig docs in lib/runtime/src/config.rs and its tests) and the system‐status server examples use DYN_SYSTEM_PORT=0 to allocate a random free port. We can therefore avoid the TOCTOU race, unbounded port scanning, and raw print calls by simply setting the env var to "0" and returning 0.

• Remove the manual socket‐bind loop and print statement.
• Delegate port choice to DRT by setting os.environ[env_var] = "0".
• Return the chosen value (0) so callers can log via vLLM’s logger if desired.

-def find_and_set_available_port_from_env(env_var="DYN_SYSTEM_PORT"):
-    """
-    Find an available port from the environment variable.
-    """
-    port = int(os.environ.get(env_var, "0"))
-    if port == 0:
-        # No port specified, let system pick
-        pass
-    while True:
-        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
-        try:
-            # Port is available
-            s.bind(("127.0.0.1", port))
-            s.close()
-            os.environ[env_var] = str(port)
-            print(f"Port {port} is available, setting env var {env_var} to {port}")
-            break
-        except OSError:
-            # Port is in use, try next
-            port += 1
-            s.close()
-        except Exception as e:
-            raise RuntimeError(f"Error finding available port: {e}")
+def find_and_set_available_port_from_env(env_var: str = "DYN_SYSTEM_PORT") -> int:
+    """
+    Hint DRT to use an ephemeral port by setting env to "0".
+    Returns the value set (always 0).
+    """
+    os.environ[env_var] = "0"
+    return 0

(If the caller needs to log the actual bound port afterwards, they can read back the chosen port via DRT’s API and use the vLLM logger, e.g.:

from vllm.logger import init_logger
logger = init_logger(__name__)
port = find_and_set_available_port_from_env()
# After DRT starts, retrieve and log the real port:
real_port = get_system_server_info().port
logger.info("System port resolved to %d", real_port)

)

lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs (1)

272-282: Propagate create_slot errors instead of swallowing

create_slot can fail; ignoring the Result and always returning Ok(()) masks real errors and can leave state inconsistent.

Apply:

-        let _ = self.connector_leader.create_slot(request, tokens);
-        let _ = self
-            .unbounded_tx
-            .send(Action::CreateSlot(input_copy, CreateSlotOutput {}));
-        Ok(())
+        self.connector_leader.create_slot(request, tokens)?;
+        let _ = self.unbounded_tx.send(Action::CreateSlot(input_copy, CreateSlotOutput {}));
+        Ok(())
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs (1)

1152-1156: Avoid unwrap() on storage pools; return errors instead

host().unwrap() and host().unwrap().register_blocks(...) can panic if host storage isn’t configured. Return a structured error to avoid crashing the task.

-    let host_blocks = block_manager
-        .host()
-        .unwrap()
-        .allocate_blocks(offload_req.block_ids.len())
-        .await?;
+    let host = block_manager
+        .host()
+        .ok_or_else(|| anyhow::anyhow!("Host storage pool not configured"))?;
+    let host_blocks = host
+        .allocate_blocks(offload_req.block_ids.len())
+        .await?;
@@
-    let immutable_blocks = block_manager
-        .host()
-        .unwrap()
-        .register_blocks(blocks_to_register)
-        .await?;
+    let immutable_blocks = host.register_blocks(blocks_to_register).await?;

Also applies to: 1228-1233

🧹 Nitpick comments (12)
lib/bindings/python/Cargo.toml (1)

83-83: Gate prometheus behind the block-manager feature

We only see “prometheus” mentioned in comments under the block‐manager code (no direct use prometheus::… or Histogram/IntCounter types), so it’s safe—and desirable—to make this crate’s Prometheus dependency optional. This will avoid unused linkage and speed up builds for consumers who don’t need block‐manager metrics.

• In lib/bindings/python/Cargo.toml, change the dependency:

-[dependencies]
-prometheus = "0.14.0"
+[dependencies]
+prometheus = { version = "0.14.0", optional = true }

• In the same file, wire the prometheus dependency into the existing block-manager feature (or a new metrics feature). For example, under [features]:

[features]
# Option A: reuse block-manager
block-manager = [
  "dynamo-llm/block-manager",
  "dep:dlpark",
  "dep:cudarc",
  "dep:prometheus",   # added
]

# Option B: new feature
# metrics = ["dep:prometheus"]

This ensures that anyone pulling in the Python bindings without --features block-manager won’t pay the cost of building or linking Prometheus.

.devcontainer/devcontainer.json (1)

65-66: Enabling rust-analyzer “allFeatures” may hide feature-gated issues and increase analysis load. Consider scoping features.

If developers don’t normally build with all features, rust-analyzer with allFeatures: true can mask missing-gate issues or slow analysis. Suggest scoping to the features most used locally (e.g., default,block-manager) or keep allFeatures off and use linkedProjects + per-crate feature configs.

Possible tweak:

-                "rust-analyzer.cargo.allFeatures": true
+                "rust-analyzer.cargo.allFeatures": false
+                // Optionally: "rust-analyzer.cargo.features": ["default","block-manager"]
lib/llm/src/block_manager/metrics_kvbm.rs (3)

6-9: Constrain metric fields’ visibility to discourage misuse.

Exposing counters as public fields invites ad-hoc increments across the codebase. Prefer pub(super) or accessor methods to keep increments intentional and centralized.

Apply this diff:

 #[derive(Clone, Debug)]
 pub struct KvbmMetrics {
-    pub offload_requests: IntCounter,
-    pub save_kv_layer_requests: IntCounter,
+    pub(super) offload_requests: IntCounter,
+    pub(super) save_kv_layer_requests: IntCounter,
 }

Optionally add lightweight getters/increment helpers if needed.


11-17: Avoid unwraps in instrumentation initialization; add context and constants for names.

Use expect(...) with context or propagate a Result to avoid panics during startup. Also, hoist metric names to constants to prevent drift.

Apply this diff:

 impl KvbmMetrics {
-    pub fn new(mr: &dyn MetricsRegistry) -> Self {
-        let offload_requests = mr.create_intcounter("offload_requests", "The number of offload requests", &[]).unwrap();
-        let save_kv_layer_requests = mr.create_intcounter("save_kv_layer_requests", "The number of save kv layer requests", &[]).unwrap();
-        Self { offload_requests, save_kv_layer_requests }
-    }
+    pub fn new(mr: &dyn MetricsRegistry) -> Self {
+        const OFFLOAD_REQUESTS: &str = "offload_requests";
+        const SAVE_KV_LAYER_REQUESTS: &str = "save_kv_layer_requests";
+
+        let offload_requests = mr
+            .create_intcounter(
+                OFFLOAD_REQUESTS,
+                "Number of KVBM offload requests",
+                &[],
+            )
+            .expect("register IntCounter offload_requests");
+
+        let save_kv_layer_requests = mr
+            .create_intcounter(
+                SAVE_KV_LAYER_REQUESTS,
+                "Number of KVBM save KV-layer requests",
+                &[],
+            )
+            .expect("register IntCounter save_kv_layer_requests");
+
+        Self { offload_requests, save_kv_layer_requests }
+    }
 }

13-15: Consider labels for success/failure or outcome where useful.

Today these are global counters. If you need per-outcome breakdowns (e.g., success/failure), change to an IntCounterVec with a low-cardinality label like status. Avoid high-cardinality labels such as layer_name.

deploy/metrics/prometheus.yml (1)

61-72: Static ports may drift from runtime if auto-port selection kicks in.

You’ve hard-coded 9998/9999, but the WAR auto-selects a free port when there are multiple DistributedRuntimes. If the leader/worker pick different ports, Prometheus won’t scrape them.

Two options:

  • Documented option (lightweight): Clearly state that when multiple DRs exist, you must pin ports via envs and keep these targets in sync.

  • Configurable option (preferred): Switch these scrape jobs to file-based SD so the runtime (or a small sidecar) can publish the active targets.

Patch sketch to add file_sd (keeps the existing static targets as examples):

   - job_name: 'kvbm-leader-metrics'
     scrape_interval: 2s
-    static_configs:
-      - targets: ['host.docker.internal:9999']
+    static_configs:
+      - targets: ['host.docker.internal:9999']
+    file_sd_configs:
+      - files:
+          - /etc/prometheus/kvbm-leader-targets.json
+        refresh_interval: 5s
@@
   - job_name: 'kvbm-worker-metrics'
     scrape_interval: 2s
-    static_configs:
-      - targets: ['host.docker.internal:9998']
+    static_configs:
+      - targets: ['host.docker.internal:9998']
+    file_sd_configs:
+      - files:
+          - /etc/prometheus/kvbm-worker-targets.json
+        refresh_interval: 5s

I can also provide a tiny script to emit those JSON files from env vars if helpful.

lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs (1)

264-266: Increment metric on layer save: consider low-cardinality outcome label if you later track errors.

Right now you only count attempts. If you need success/failure or skipped layers later, switch to IntCounterVec with a status label.

deploy/metrics/grafana_dashboards/grafana-kvbm-dashboard.json (3)

69-80: Remove irrelevant absolute thresholds for timeseries

The absolute thresholds (green/red at 80) aren’t meaningful for these panels and visually distract. Remove them or set alerting thresholds deliberately based on SLOs.

-            "thresholds": {
-              "mode": "absolute",
-              "steps": [
-                { "color": "green" },
-                { "color": "red", "value": 80 }
-              ]
-            }
+            "thresholds": { "mode": "absolute", "steps": [ { "color": "green" } ] }

Also applies to: 165-176


219-219: Use an explicit refresh interval

"auto" depends on user settings and may not match the Prometheus scrape interval (2s per PR summary). Use 5s or 10s to balance load and responsiveness.

-    "refresh": "auto",
+    "refresh": "5s",

221-224: Add a namespace/job filter variable for multi-env use

Hardcoded dynamo_namespace filters lock this dashboard to KVBM only. Adding a variable (e.g., $namespace with default kvbm_connector_leader/worker) allows reuse across envs.

Happy to provide a patch to add:

  • $namespace (query label values for dynamo_namespace)
  • Use exprs like rate(metric{dynamo_namespace="$namespace"}[1m])

Also applies to: 107-115, 203-211

lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs (2)

1029-1033: Offload instrumentation placement is good; consider adding basic labels

Incrementing offload_requests at the start of processing is fine. If your MetricsRegistry supports labels, consider adding reason or outcome labels in the future (e.g., success/failure). Not blocking.

Also applies to: 1062-1070, 1134-1142


219-221: Mutex poisoning: handle lock errors explicitly

Using .unwrap() on Mutex::lock() would panic on poisoning; you already handle locks elsewhere with map_err. Consider consistent error mapping here to avoid panics.

-        self.slots.lock().unwrap().contains_key(request_id)
+        self.slots
+            .lock()
+            .map(|m| m.contains_key(request_id))
+            .unwrap_or(false)
@@
-        self.slots.lock().unwrap().remove(request_id);
+        if let Ok(mut slots) = self.slots.lock() {
+            slots.remove(request_id);
+        }

Also applies to: 243-251

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 105436c and 830b5e0.

⛔ Files ignored due to path filters (1)
  • lib/bindings/python/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (14)
  • .devcontainer/devcontainer.json (2 hunks)
  • deploy/metrics/grafana_dashboards/grafana-kvbm-dashboard.json (1 hunks)
  • deploy/metrics/prometheus.yml (1 hunks)
  • docs/guides/run_kvbm_in_vllm.md (1 hunks)
  • lib/bindings/python/Cargo.toml (1 hunks)
  • lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs (3 hunks)
  • lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs (1 hunks)
  • lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs (6 hunks)
  • lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs (5 hunks)
  • lib/bindings/python/src/dynamo/llm/vllm_integration/connector_leader.py (2 hunks)
  • lib/bindings/python/src/dynamo/llm/vllm_integration/connector_worker.py (2 hunks)
  • lib/bindings/python/src/dynamo/llm/vllm_integration/kv_cache_utils.py (2 hunks)
  • lib/llm/src/block_manager.rs (1 hunks)
  • lib/llm/src/block_manager/metrics_kvbm.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-17T00:50:44.845Z
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1236
File: lib/llm/src/mocker/engine.rs:140-161
Timestamp: 2025-06-17T00:50:44.845Z
Learning: In Rust async code, when an Arc<Mutex<_>> is used solely to transfer ownership of a resource (like a channel receiver) into a spawned task rather than for sharing between multiple tasks, holding the mutex lock across an await is not problematic since there's no actual contention.

Applied to files:

  • lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs
🧬 Code graph analysis (8)
lib/bindings/python/src/dynamo/llm/vllm_integration/connector_worker.py (1)
lib/bindings/python/src/dynamo/llm/vllm_integration/kv_cache_utils.py (1)
  • find_and_set_available_port_from_env (93-115)
lib/bindings/python/src/dynamo/llm/vllm_integration/connector_leader.py (1)
lib/bindings/python/src/dynamo/llm/vllm_integration/kv_cache_utils.py (1)
  • find_and_set_available_port_from_env (93-115)
lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs (4)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs (2)
  • new (88-119)
  • new (482-508)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs (1)
  • new (88-143)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs (7)
  • new (178-213)
  • new (321-352)
  • new (956-969)
  • new (980-993)
  • new (1003-1013)
  • new (1300-1305)
  • new (1309-1314)
lib/llm/src/block_manager/metrics_kvbm.rs (1)
  • new (12-16)
lib/bindings/python/src/dynamo/llm/vllm_integration/kv_cache_utils.py (1)
lib/runtime/src/system_status_server.rs (1)
  • port (51-53)
lib/llm/src/block_manager/metrics_kvbm.rs (4)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs (2)
  • new (88-119)
  • new (482-508)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs (1)
  • new (88-143)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs (7)
  • new (178-213)
  • new (321-352)
  • new (956-969)
  • new (980-993)
  • new (1003-1013)
  • new (1300-1305)
  • new (1309-1314)
lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs (2)
  • new (77-116)
  • new (399-403)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs (4)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs (1)
  • new (88-143)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs (7)
  • new (178-213)
  • new (321-352)
  • new (956-969)
  • new (980-993)
  • new (1003-1013)
  • new (1300-1305)
  • new (1309-1314)
lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs (2)
  • new (77-116)
  • new (399-403)
lib/llm/src/block_manager/metrics_kvbm.rs (1)
  • new (12-16)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs (2)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs (2)
  • new (88-119)
  • new (482-508)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs (7)
  • new (178-213)
  • new (321-352)
  • new (956-969)
  • new (980-993)
  • new (1003-1013)
  • new (1300-1305)
  • new (1309-1314)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs (1)
lib/runtime/src/utils/tasks/critical.rs (1)
  • new_with_runtime (68-168)
🪛 LanguageTool
docs/guides/run_kvbm_in_vllm.md

[grammar] ~65-~65: There might be a mistake here.
Context: ...s to enable metrics collection and view via Grafana dashboard: ```bash # Start the ...

(QB_NEW_EN)


[grammar] ~79-~79: There might be a mistake here.
Context: ...(default login: dynamo/dynamo) and look for KVBM Dashboard

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
docs/guides/run_kvbm_in_vllm.md

79-79: Bare URL used

(MD034, no-bare-urls)

🪛 GitHub Actions: Copyright Checks
lib/llm/src/block_manager/metrics_kvbm.rs

[error] 1-1: Copyright header check failed. Missing or invalid header in lib/llm/src/block_manager/metrics_kvbm.rs.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (15)
.devcontainer/devcontainer.json (1)

78-79: Trailing comma fix LGTM.

Removing the trailing comma after CARGO_TARGET_DIR is correct JSON and prevents parse issues.

lib/bindings/python/src/dynamo/llm/vllm_integration/connector_leader.py (2)

33-35: Import for port setup looks good.

Importing find_and_set_available_port_from_env here mirrors the worker path and keeps leader/worker behavior consistent.


60-60: I've initiated a search for all definitions and references to find_and_set_available_port_from_env throughout the repository to confirm its default argument and usage. Once results are back, I'll verify if we can safely drop the explicit "DYN_SYSTEM_PORT" string and rely on the function’s default.

lib/bindings/python/src/dynamo/llm/vllm_integration/connector_worker.py (1)

31-33: Import mirrors leader path—good consistency.

Keeping both leader and worker on the same utility avoids drift between processes.

lib/llm/src/block_manager.rs (1)

31-31: Module export looks good.

pub mod metrics_kvbm; cleanly scopes the new metrics without touching existing public API surfaces beyond the module. No issues.

lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs (3)

8-8: Import and usage of KvbmMetrics is correct.

Imports the new metrics type from metrics_kvbm and integrates it in the worker lifecycle as intended.


73-74: Worker-scoped metrics instance: initialization and storage are sound.

The KvbmMetrics is created in the worker namespace and stored on the struct; fits the pattern used on the leader path.

Also applies to: 94-95, 114-115


1-120: Offload request metric increment confirmed

  • Located kvbm_metrics.offload_requests.inc() in lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs:1140, confirming that offload requests are counted on the leader/offload path.

No additional increments are required.

deploy/metrics/grafana_dashboards/grafana-kvbm-dashboard.json (1)

22-23: Consider omitting fixed id or making uid stable by convention

  • Hardcoding "id": 4 can conflict on import; Grafana will assign an ID if omitted.
  • A stable "uid" is fine if you intend to provision this dashboard. If this might be imported into other stacks, either keep the uid stable by convention or omit it and let Grafana generate one.

If this file is used via provisioning, keep a stable uid but drop "id". Otherwise, drop both and let Grafana assign.

Also applies to: 231-233

lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs (2)

10-11: Import of KvbmMetrics is correct and scoped

Import aligns with downstream usage in slot manager. No issues.


113-118: Signature update verified and all call sites updated

  • ConnectorSlotManager::new now requires kvbm_metrics as the fourth argument.
  • Both call sites in
    • lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs (line 113)
    • lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs (line 123)
      have been updated accordingly.
  • A recursive search for all uses of ConnectorSlotManager::new confirms there are no other occurrences.
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs (1)

122-128: Good: Recorder keeps iteration counter and wires metrics into SlotManager

This ensures the LocalTransferEngine path can instrument offloads. Looks solid.

lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs (3)

4-4: Arc import is necessary and correct

Required for sharing KvbmLeader across async tasks. Good.


178-205: Thread KvbmMetrics into LocalTransferEngine; ensure Clone semantics

Wiring looks right. You clone kvbm_metrics later inside the offload task; confirm KvbmMetrics implements Clone. If not, switch to Arc to avoid ownership friction across tasks.

If Clone is unavailable, change signatures to accept Arc:

-        kvbm_metrics: KvbmMetrics,
+        kvbm_metrics: Arc<KvbmMetrics>,
@@
-                xfer_engine.execute(cancellation_token, drt_for_task, kvbm_metrics).await
+                xfer_engine.execute(cancellation_token, drt_for_task, kvbm_metrics).await

and:

-        kvbm_metrics: KvbmMetrics,
+        kvbm_metrics: Arc<KvbmMetrics>,
@@
-                        process_offload_request(req, &block_manager_offload, &leader_offload, kvbm_metrics.clone()).await
+                        process_offload_request(req, &block_manager_offload, &leader_offload, kvbm_metrics.clone()).await

521-552: Solid offload request scheduling path

Policy evaluation, request construction, and pairing with worker operation IDs are coherent. Logs include request_id and operation_id for traceability.

Also applies to: 846-886

Copy link
Contributor

@keivenchang keivenchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great that you got it working! Just a few minor comments:

  • Use the prometheus_names.rs as prefix constants throughout the code, it'll make changing/updates much easier
  • I apologize for asking you to try out ports 9998 and 9999. Can you pick another range?
  • devcontainer.json changes should be in another PR

@ziqifan617 ziqifan617 requested a review from keivenchang August 22, 2025 17:24
Copy link
Contributor

@keivenchang keivenchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rmccorm4
Copy link
Contributor

@CodeRabbit review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 22, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@ziqifan617 ziqifan617 enabled auto-merge (squash) August 22, 2025 19:19
@ziqifan617 ziqifan617 merged commit b658ba6 into main Aug 22, 2025
12 checks passed
@ziqifan617 ziqifan617 deleted the ziqif/enable-kvbm-metrics branch August 22, 2025 19:58
hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
nv-anants pushed a commit that referenced this pull request Aug 28, 2025
jasonqinzhou pushed a commit that referenced this pull request Aug 30, 2025
Signed-off-by: Jason Zhou <jasonzho@jasonzho-mlt.client.nvidia.com>
KrishnanPrash pushed a commit that referenced this pull request Sep 2, 2025
Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
nnshah1 pushed a commit that referenced this pull request Sep 8, 2025
Signed-off-by: nnshah1 <neelays@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants