-
Notifications
You must be signed in to change notification settings - Fork 676
feat: enable dynamo metrics on KVBM #2626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds KVBM metrics: new KvbmMetrics struct and integration across vLLM leader/worker paths; propagates metrics through offload/save flows. Introduces Prometheus and Grafana configs, plus documentation to enable/view metrics. Adds a port-selection helper and uses it in leader/worker Python initializers. Minor devcontainer setting updates. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Leader as KvConnectorLeader
participant Worker as KvConnectorWorker
participant DRT as DistributedRuntime<br/>(MetricsRegistry ns)
participant Prom as Prometheus
participant Graf as Grafana
rect rgb(240,248,255)
note over Leader,Worker: Metrics initialization
Leader->>DRT: namespace("kvbm_connector_leader")
Worker->>DRT: namespace("kvbm_connector_worker")
DRT-->>Leader: MetricsRegistry (KvbmMetrics)
DRT-->>Worker: MetricsRegistry (KvbmMetrics)
end
rect rgb(245,255,240)
note over Worker: Save KV layer
Worker->>Worker: kvbm_metrics.save_kv_layer_requests.inc()
end
rect rgb(255,250,240)
note over Leader: Offload request
Leader->>Leader: kvbm_metrics.offload_requests.inc()
end
Prom-->>Leader: scrape /metrics (9999)
Prom-->>Worker: scrape /metrics (9998)
Graf-->>Prom: query counters for dashboard
sequenceDiagram
autonumber
participant Client as vLLM
participant Leader as ConnectorSlotManager/LocalTransferEngine
participant BM as VllmBlockManager
participant Metrics as KvbmMetrics
Client->>Leader: enqueue offload request
Leader->>Leader: process_offload_request(...)
Leader->>Metrics: offload_requests.inc()
Leader->>BM: perform offload ops
BM-->>Leader: result
Leader-->>Client: completion
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
lib/bindings/python/src/dynamo/llm/vllm_integration/kv_cache_utils.py (1)
93-116: Simplifyfind_and_set_available_port_from_envby delegating ephemeral port selection to DRTThe runtime already treats
DYN_SYSTEM_PORT=0as “choose an ephemeral port” (seeRuntimeConfigdocs inlib/runtime/src/config.rsand its tests) and the system‐status server examples useDYN_SYSTEM_PORT=0to allocate a random free port. We can therefore avoid the TOCTOU race, unbounded port scanning, and raw"0"and returning 0.• Remove the manual socket‐bind loop and
• Delegate port choice to DRT by settingos.environ[env_var] = "0".
• Return the chosen value (0) so callers can log via vLLM’s logger if desired.-def find_and_set_available_port_from_env(env_var="DYN_SYSTEM_PORT"): - """ - Find an available port from the environment variable. - """ - port = int(os.environ.get(env_var, "0")) - if port == 0: - # No port specified, let system pick - pass - while True: - s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) - try: - # Port is available - s.bind(("127.0.0.1", port)) - s.close() - os.environ[env_var] = str(port) - print(f"Port {port} is available, setting env var {env_var} to {port}") - break - except OSError: - # Port is in use, try next - port += 1 - s.close() - except Exception as e: - raise RuntimeError(f"Error finding available port: {e}") +def find_and_set_available_port_from_env(env_var: str = "DYN_SYSTEM_PORT") -> int: + """ + Hint DRT to use an ephemeral port by setting env to "0". + Returns the value set (always 0). + """ + os.environ[env_var] = "0" + return 0(If the caller needs to log the actual bound port afterwards, they can read back the chosen port via DRT’s API and use the vLLM logger, e.g.:
from vllm.logger import init_logger logger = init_logger(__name__) port = find_and_set_available_port_from_env() # After DRT starts, retrieve and log the real port: real_port = get_system_server_info().port logger.info("System port resolved to %d", real_port))
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs (1)
272-282: Propagate create_slot errors instead of swallowingcreate_slot can fail; ignoring the Result and always returning Ok(()) masks real errors and can leave state inconsistent.
Apply:
- let _ = self.connector_leader.create_slot(request, tokens); - let _ = self - .unbounded_tx - .send(Action::CreateSlot(input_copy, CreateSlotOutput {})); - Ok(()) + self.connector_leader.create_slot(request, tokens)?; + let _ = self.unbounded_tx.send(Action::CreateSlot(input_copy, CreateSlotOutput {})); + Ok(())lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs (1)
1152-1156: Avoid unwrap() on storage pools; return errors insteadhost().unwrap() and host().unwrap().register_blocks(...) can panic if host storage isn’t configured. Return a structured error to avoid crashing the task.
- let host_blocks = block_manager - .host() - .unwrap() - .allocate_blocks(offload_req.block_ids.len()) - .await?; + let host = block_manager + .host() + .ok_or_else(|| anyhow::anyhow!("Host storage pool not configured"))?; + let host_blocks = host + .allocate_blocks(offload_req.block_ids.len()) + .await?; @@ - let immutable_blocks = block_manager - .host() - .unwrap() - .register_blocks(blocks_to_register) - .await?; + let immutable_blocks = host.register_blocks(blocks_to_register).await?;Also applies to: 1228-1233
🧹 Nitpick comments (12)
lib/bindings/python/Cargo.toml (1)
83-83: Gateprometheusbehind theblock-managerfeatureWe only see “prometheus” mentioned in comments under the block‐manager code (no direct
use prometheus::…orHistogram/IntCountertypes), so it’s safe—and desirable—to make this crate’s Prometheus dependency optional. This will avoid unused linkage and speed up builds for consumers who don’t need block‐manager metrics.• In
lib/bindings/python/Cargo.toml, change the dependency:-[dependencies] -prometheus = "0.14.0" +[dependencies] +prometheus = { version = "0.14.0", optional = true }• In the same file, wire the
prometheusdependency into the existingblock-managerfeature (or a newmetricsfeature). For example, under[features]:[features] # Option A: reuse block-manager block-manager = [ "dynamo-llm/block-manager", "dep:dlpark", "dep:cudarc", "dep:prometheus", # added ] # Option B: new feature # metrics = ["dep:prometheus"]This ensures that anyone pulling in the Python bindings without
--features block-managerwon’t pay the cost of building or linking Prometheus..devcontainer/devcontainer.json (1)
65-66: Enabling rust-analyzer “allFeatures” may hide feature-gated issues and increase analysis load. Consider scoping features.If developers don’t normally build with all features, rust-analyzer with
allFeatures: truecan mask missing-gate issues or slow analysis. Suggest scoping to the features most used locally (e.g.,default,block-manager) or keepallFeaturesoff and uselinkedProjects+ per-crate feature configs.Possible tweak:
- "rust-analyzer.cargo.allFeatures": true + "rust-analyzer.cargo.allFeatures": false + // Optionally: "rust-analyzer.cargo.features": ["default","block-manager"]lib/llm/src/block_manager/metrics_kvbm.rs (3)
6-9: Constrain metric fields’ visibility to discourage misuse.Exposing counters as public fields invites ad-hoc increments across the codebase. Prefer
pub(super)or accessor methods to keep increments intentional and centralized.Apply this diff:
#[derive(Clone, Debug)] pub struct KvbmMetrics { - pub offload_requests: IntCounter, - pub save_kv_layer_requests: IntCounter, + pub(super) offload_requests: IntCounter, + pub(super) save_kv_layer_requests: IntCounter, }Optionally add lightweight getters/increment helpers if needed.
11-17: Avoid unwraps in instrumentation initialization; add context and constants for names.Use
expect(...)with context or propagate aResultto avoid panics during startup. Also, hoist metric names to constants to prevent drift.Apply this diff:
impl KvbmMetrics { - pub fn new(mr: &dyn MetricsRegistry) -> Self { - let offload_requests = mr.create_intcounter("offload_requests", "The number of offload requests", &[]).unwrap(); - let save_kv_layer_requests = mr.create_intcounter("save_kv_layer_requests", "The number of save kv layer requests", &[]).unwrap(); - Self { offload_requests, save_kv_layer_requests } - } + pub fn new(mr: &dyn MetricsRegistry) -> Self { + const OFFLOAD_REQUESTS: &str = "offload_requests"; + const SAVE_KV_LAYER_REQUESTS: &str = "save_kv_layer_requests"; + + let offload_requests = mr + .create_intcounter( + OFFLOAD_REQUESTS, + "Number of KVBM offload requests", + &[], + ) + .expect("register IntCounter offload_requests"); + + let save_kv_layer_requests = mr + .create_intcounter( + SAVE_KV_LAYER_REQUESTS, + "Number of KVBM save KV-layer requests", + &[], + ) + .expect("register IntCounter save_kv_layer_requests"); + + Self { offload_requests, save_kv_layer_requests } + } }
13-15: Consider labels for success/failure or outcome where useful.Today these are global counters. If you need per-outcome breakdowns (e.g., success/failure), change to an
IntCounterVecwith a low-cardinality label likestatus. Avoid high-cardinality labels such aslayer_name.deploy/metrics/prometheus.yml (1)
61-72: Static ports may drift from runtime if auto-port selection kicks in.You’ve hard-coded 9998/9999, but the WAR auto-selects a free port when there are multiple DistributedRuntimes. If the leader/worker pick different ports, Prometheus won’t scrape them.
Two options:
Documented option (lightweight): Clearly state that when multiple DRs exist, you must pin ports via envs and keep these targets in sync.
Configurable option (preferred): Switch these scrape jobs to file-based SD so the runtime (or a small sidecar) can publish the active targets.
Patch sketch to add file_sd (keeps the existing static targets as examples):
- job_name: 'kvbm-leader-metrics' scrape_interval: 2s - static_configs: - - targets: ['host.docker.internal:9999'] + static_configs: + - targets: ['host.docker.internal:9999'] + file_sd_configs: + - files: + - /etc/prometheus/kvbm-leader-targets.json + refresh_interval: 5s @@ - job_name: 'kvbm-worker-metrics' scrape_interval: 2s - static_configs: - - targets: ['host.docker.internal:9998'] + static_configs: + - targets: ['host.docker.internal:9998'] + file_sd_configs: + - files: + - /etc/prometheus/kvbm-worker-targets.json + refresh_interval: 5sI can also provide a tiny script to emit those JSON files from env vars if helpful.
lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs (1)
264-266: Increment metric on layer save: consider low-cardinality outcome label if you later track errors.Right now you only count attempts. If you need success/failure or skipped layers later, switch to
IntCounterVecwith astatuslabel.deploy/metrics/grafana_dashboards/grafana-kvbm-dashboard.json (3)
69-80: Remove irrelevant absolute thresholds for timeseriesThe absolute thresholds (green/red at 80) aren’t meaningful for these panels and visually distract. Remove them or set alerting thresholds deliberately based on SLOs.
- "thresholds": { - "mode": "absolute", - "steps": [ - { "color": "green" }, - { "color": "red", "value": 80 } - ] - } + "thresholds": { "mode": "absolute", "steps": [ { "color": "green" } ] }Also applies to: 165-176
219-219: Use an explicit refresh interval"auto" depends on user settings and may not match the Prometheus scrape interval (2s per PR summary). Use 5s or 10s to balance load and responsiveness.
- "refresh": "auto", + "refresh": "5s",
221-224: Add a namespace/job filter variable for multi-env useHardcoded dynamo_namespace filters lock this dashboard to KVBM only. Adding a variable (e.g., $namespace with default kvbm_connector_leader/worker) allows reuse across envs.
Happy to provide a patch to add:
- $namespace (query label values for dynamo_namespace)
- Use exprs like rate(metric{dynamo_namespace="$namespace"}[1m])
Also applies to: 107-115, 203-211
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs (2)
1029-1033: Offload instrumentation placement is good; consider adding basic labelsIncrementing offload_requests at the start of processing is fine. If your MetricsRegistry supports labels, consider adding reason or outcome labels in the future (e.g., success/failure). Not blocking.
Also applies to: 1062-1070, 1134-1142
219-221: Mutex poisoning: handle lock errors explicitlyUsing .unwrap() on Mutex::lock() would panic on poisoning; you already handle locks elsewhere with map_err. Consider consistent error mapping here to avoid panics.
- self.slots.lock().unwrap().contains_key(request_id) + self.slots + .lock() + .map(|m| m.contains_key(request_id)) + .unwrap_or(false) @@ - self.slots.lock().unwrap().remove(request_id); + if let Ok(mut slots) = self.slots.lock() { + slots.remove(request_id); + }Also applies to: 243-251
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
lib/bindings/python/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (14)
.devcontainer/devcontainer.json(2 hunks)deploy/metrics/grafana_dashboards/grafana-kvbm-dashboard.json(1 hunks)deploy/metrics/prometheus.yml(1 hunks)docs/guides/run_kvbm_in_vllm.md(1 hunks)lib/bindings/python/Cargo.toml(1 hunks)lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs(3 hunks)lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs(1 hunks)lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs(6 hunks)lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs(5 hunks)lib/bindings/python/src/dynamo/llm/vllm_integration/connector_leader.py(2 hunks)lib/bindings/python/src/dynamo/llm/vllm_integration/connector_worker.py(2 hunks)lib/bindings/python/src/dynamo/llm/vllm_integration/kv_cache_utils.py(2 hunks)lib/llm/src/block_manager.rs(1 hunks)lib/llm/src/block_manager/metrics_kvbm.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-17T00:50:44.845Z
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1236
File: lib/llm/src/mocker/engine.rs:140-161
Timestamp: 2025-06-17T00:50:44.845Z
Learning: In Rust async code, when an Arc<Mutex<_>> is used solely to transfer ownership of a resource (like a channel receiver) into a spawned task rather than for sharing between multiple tasks, holding the mutex lock across an await is not problematic since there's no actual contention.
Applied to files:
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs
🧬 Code graph analysis (8)
lib/bindings/python/src/dynamo/llm/vllm_integration/connector_worker.py (1)
lib/bindings/python/src/dynamo/llm/vllm_integration/kv_cache_utils.py (1)
find_and_set_available_port_from_env(93-115)
lib/bindings/python/src/dynamo/llm/vllm_integration/connector_leader.py (1)
lib/bindings/python/src/dynamo/llm/vllm_integration/kv_cache_utils.py (1)
find_and_set_available_port_from_env(93-115)
lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs (4)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs (2)
new(88-119)new(482-508)lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs (1)
new(88-143)lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs (7)
new(178-213)new(321-352)new(956-969)new(980-993)new(1003-1013)new(1300-1305)new(1309-1314)lib/llm/src/block_manager/metrics_kvbm.rs (1)
new(12-16)
lib/bindings/python/src/dynamo/llm/vllm_integration/kv_cache_utils.py (1)
lib/runtime/src/system_status_server.rs (1)
port(51-53)
lib/llm/src/block_manager/metrics_kvbm.rs (4)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs (2)
new(88-119)new(482-508)lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs (1)
new(88-143)lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs (7)
new(178-213)new(321-352)new(956-969)new(980-993)new(1003-1013)new(1300-1305)new(1309-1314)lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs (2)
new(77-116)new(399-403)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs (4)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs (1)
new(88-143)lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs (7)
new(178-213)new(321-352)new(956-969)new(980-993)new(1003-1013)new(1300-1305)new(1309-1314)lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs (2)
new(77-116)new(399-403)lib/llm/src/block_manager/metrics_kvbm.rs (1)
new(12-16)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs (2)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs (2)
new(88-119)new(482-508)lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs (7)
new(178-213)new(321-352)new(956-969)new(980-993)new(1003-1013)new(1300-1305)new(1309-1314)
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs (1)
lib/runtime/src/utils/tasks/critical.rs (1)
new_with_runtime(68-168)
🪛 LanguageTool
docs/guides/run_kvbm_in_vllm.md
[grammar] ~65-~65: There might be a mistake here.
Context: ...s to enable metrics collection and view via Grafana dashboard: ```bash # Start the ...
(QB_NEW_EN)
[grammar] ~79-~79: There might be a mistake here.
Context: ...(default login: dynamo/dynamo) and look for KVBM Dashboard
(QB_NEW_EN)
🪛 markdownlint-cli2 (0.17.2)
docs/guides/run_kvbm_in_vllm.md
79-79: Bare URL used
(MD034, no-bare-urls)
🪛 GitHub Actions: Copyright Checks
lib/llm/src/block_manager/metrics_kvbm.rs
[error] 1-1: Copyright header check failed. Missing or invalid header in lib/llm/src/block_manager/metrics_kvbm.rs.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (15)
.devcontainer/devcontainer.json (1)
78-79: Trailing comma fix LGTM.Removing the trailing comma after
CARGO_TARGET_DIRis correct JSON and prevents parse issues.lib/bindings/python/src/dynamo/llm/vllm_integration/connector_leader.py (2)
33-35: Import for port setup looks good.Importing
find_and_set_available_port_from_envhere mirrors the worker path and keeps leader/worker behavior consistent.
60-60: I've initiated a search for all definitions and references tofind_and_set_available_port_from_envthroughout the repository to confirm its default argument and usage. Once results are back, I'll verify if we can safely drop the explicit"DYN_SYSTEM_PORT"string and rely on the function’s default.lib/bindings/python/src/dynamo/llm/vllm_integration/connector_worker.py (1)
31-33: Import mirrors leader path—good consistency.Keeping both leader and worker on the same utility avoids drift between processes.
lib/llm/src/block_manager.rs (1)
31-31: Module export looks good.
pub mod metrics_kvbm;cleanly scopes the new metrics without touching existing public API surfaces beyond the module. No issues.lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs (3)
8-8: Import and usage of KvbmMetrics is correct.Imports the new metrics type from
metrics_kvbmand integrates it in the worker lifecycle as intended.
73-74: Worker-scoped metrics instance: initialization and storage are sound.The
KvbmMetricsis created in the worker namespace and stored on the struct; fits the pattern used on the leader path.Also applies to: 94-95, 114-115
1-120: Offload request metric increment confirmed
- Located
kvbm_metrics.offload_requests.inc()inlib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs:1140, confirming that offload requests are counted on the leader/offload path.No additional increments are required.
deploy/metrics/grafana_dashboards/grafana-kvbm-dashboard.json (1)
22-23: Consider omitting fixed id or making uid stable by convention
- Hardcoding "id": 4 can conflict on import; Grafana will assign an ID if omitted.
- A stable "uid" is fine if you intend to provision this dashboard. If this might be imported into other stacks, either keep the uid stable by convention or omit it and let Grafana generate one.
If this file is used via provisioning, keep a stable uid but drop "id". Otherwise, drop both and let Grafana assign.
Also applies to: 231-233
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs (2)
10-11: Import of KvbmMetrics is correct and scopedImport aligns with downstream usage in slot manager. No issues.
113-118: Signature update verified and all call sites updated
- ConnectorSlotManager::new now requires
kvbm_metricsas the fourth argument.- Both call sites in
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs(line 113)lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs(line 123)
have been updated accordingly.- A recursive search for all uses of
ConnectorSlotManager::newconfirms there are no other occurrences.lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs (1)
122-128: Good: Recorder keeps iteration counter and wires metrics into SlotManagerThis ensures the LocalTransferEngine path can instrument offloads. Looks solid.
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/slot.rs (3)
4-4: Arc import is necessary and correctRequired for sharing KvbmLeader across async tasks. Good.
178-205: Thread KvbmMetrics into LocalTransferEngine; ensure Clone semanticsWiring looks right. You clone kvbm_metrics later inside the offload task; confirm KvbmMetrics implements Clone. If not, switch to Arc to avoid ownership friction across tasks.
If Clone is unavailable, change signatures to accept Arc:
- kvbm_metrics: KvbmMetrics, + kvbm_metrics: Arc<KvbmMetrics>, @@ - xfer_engine.execute(cancellation_token, drt_for_task, kvbm_metrics).await + xfer_engine.execute(cancellation_token, drt_for_task, kvbm_metrics).awaitand:
- kvbm_metrics: KvbmMetrics, + kvbm_metrics: Arc<KvbmMetrics>, @@ - process_offload_request(req, &block_manager_offload, &leader_offload, kvbm_metrics.clone()).await + process_offload_request(req, &block_manager_offload, &leader_offload, kvbm_metrics.clone()).await
521-552: Solid offload request scheduling pathPolicy evaluation, request construction, and pairing with worker operation IDs are coherent. Logs include request_id and operation_id for traceability.
Also applies to: 846-886
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs
Outdated
Show resolved
Hide resolved
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs
Show resolved
Hide resolved
keivenchang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great that you got it working! Just a few minor comments:
- Use the prometheus_names.rs as prefix constants throughout the code, it'll make changing/updates much easier
- I apologize for asking you to try out ports 9998 and 9999. Can you pick another range?
- devcontainer.json changes should be in another PR
lib/bindings/python/rust/llm/block_manager/vllm/connector/leader/recorder.rs
Show resolved
Hide resolved
keivenchang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Signed-off-by: Jason Zhou <jasonzho@jasonzho-mlt.client.nvidia.com>
Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
Overview:
Set up the infra to enable metrics in KVBM using dynamo metrics mechanism.
In the future PRs, I will add more concrete KVBM metrics. This PR only contains two metrics for demo purpose.

Details:
Note that there is a WAR since we have more than one DistritbutedRuntimes, one for KVBM leader and one or more for each KVBM worker. To avoid metrics port conflict, we would search for a new port for each DRT. In the future, @keivenchang might include this as a default feature for metrics registry
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Documentation
Chores