feat: timing metrics reporting through `nvext` for vLLM backend #4661

zhongxuanwang-nv · 2025-12-01T03:26:35Z

Overview:

This PR introduces end-to-end timing metrics through the entire Dynamo → vLLM → frontend stack.

Sample response:

  "nvext": {
    "timing_metrics": {
      "request_received_seconds": 1764561049.9306614,
      "prefill_start_seconds": 1764561049.9381237,
      "prefill_end_seconds": 1764561049.9602249,
      "decode_start_seconds": 1764561049.9671068,
      "decode_first_token_seconds": 1764561050.898138,
      "decode_end_seconds": 1764561050.8981397,
      "request_completed_seconds": 1764561050.8997853
    }
  }

Details:

Added:

End-to-end timing collection in vLLM worker handler generate / prefill_generate, including:
- _should_include_timing_metrics gate on extra_fields and _get_current_time_seconds helper in components/src/dynamo/vllm/handlers.py.
- timing_metrics map with prefill_start_seconds, prefill_end_seconds, decode_start_seconds, decode_first_token_seconds, decode_end_seconds.
Frontend timing capture in OpenAI HTTP service:
- request_received_seconds recorded at HTTP ingress and stored into request.nvext.request_received_seconds in lib/llm/src/http/service/openai.rs for both completions and chat completions.
Request completion timestamp injection:
- inject_request_completed_seconds helper that adds request_completed_seconds under timing_metrics in lib/llm/src/http/service/openai.rs, plus unit tests.
Preprocessor plumbing for timing:
- New request_received_seconds: Option<f64> on PreprocessedRequest in lib/llm/src/protocols/common/preprocessor.rs.
- Builder now sets request_received_seconds from nvext in lib/llm/src/preprocessor.rs.

Changed:

vLLM worker generate paths in components/src/dynamo/vllm/handlers.py:
- On aggregated requests (no prefill worker), treat decode start as both prefill_start_seconds and decode_start_seconds, and use first-token time as prefill_end_seconds.
- When a result chunk finishes (has finish_reason), attach the full timing_metrics map into tok["disaggregated_params"]["timing_metrics"] so downstream layers can read it.
- Ensure completion_usage construction is wrapped cleanly before further augmentation.
OpenAI streaming responses in lib/llm/src/http/service/openai.rs:
- Wrap streaming completions_single and chat_completions with a map that injects request_completed_seconds only on the chunk whose choices have a finish_reason (not the usage-only chunk), matching when backend timing metrics are finalized.
Preprocessing pipeline:
- OpenAIPreprocessor now forwards request_received_seconds from request.nvext() into the PreprocessedRequest builder so workers can see frontend ingress time.

Removed:

None

Why it matters:

Attaching timing_metrics into disaggregated_params on the token stream makes these numbers available to observability / analytics stacks and any downstream Dynamo components without changing the public OpenAI surface.
Injecting completion timestamps only on the final content chunk avoids double-counting and keeps the usage-only chunk semantics clean.

Where should the reviewer start?

Start with the vLLM worker wiring:
- components/src/dynamo/vllm/handlers.py
  - Focus on _should_include_timing_metrics, the new timing_metrics lifecycle in
    BaseWorkerHandler.generate, and how prefill vs aggregated decode modes are handled.
Then look at frontend ingress/egress timing:
- lib/llm/src/http/service/openai.rs
  - handler_completions / handler_chat_completions for request_received_seconds.
  - inject_request_completed_seconds and its use in completions_single / chat_completions
    where the stream is mapped to stamp the final content chunk.
Finally, verify the plumbing and types:
- lib/llm/src/protocols/common/preprocessor.rs (PreprocessedRequest new field).
- lib/llm/src/preprocessor.rs (OpenAIPreprocessor copying request_received_seconds).
- The tests at the bottom of lib/llm/src/http/service/openai.rs to validate the egress stamping.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Relates to Dynamo requirements from NAT team

Summary by CodeRabbit

New Features
- Added comprehensive timing metrics collection throughout the request lifecycle, capturing key timestamps such as request arrival, decode/prefill processing stages, and response completion for improved visibility into system behavior.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

copy-pr-bot · 2025-12-01T03:26:38Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2025-12-01T03:26:42Z

👋 Hi zhongxuanwang-nv! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

Signed-off-by: Zhongxuan Wang <daniewang@nvidia.com>

zhongxuanwang-nv · 2025-12-01T19:12:55Z

@coderabbitai review

coderabbitai · 2025-12-01T19:13:01Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2025-12-01T19:16:07Z

Walkthrough

End-to-end timing metrics support is added across Python and Rust layers. The vllm handlers collect timing data (request received, prefill/decode start/end), the HTTP service injects boundary timestamps, the preprocessor extracts timing fields, and delta response builders include timing metrics in nvext responses.

Changes

Cohort / File(s)	Summary
Timing collection in vllm handlers `components/src/dynamo/vllm/handlers.py`	Added helper functions to detect and capture timing metrics; instrumented decode/prefill/generate paths to populate timing fields (request_received_seconds, decode/prefill start/end times) and merge timing data across worker boundaries; extended disaggregated_params construction to include timing_metrics dict.
HTTP service timing injection `lib/llm/src/http/service/openai.rs`	Added private helper `inject_request_completed_seconds()` to populate nvext.timing_metrics; captures `request_received_seconds` at request arrival in handlers (completions, chat_completions, embeddings); injects `request_completed_seconds` on final streaming chunks via per-stream transformation; added unit tests.
Preprocessor and protocol schema extensions `lib/llm/src/preprocessor.rs`, `lib/llm/src/protocols/common/preprocessor.rs`, `lib/llm/src/protocols/openai/nvext.rs`	Extended `PreprocessedRequest` with optional `request_received_seconds` field; updated `OpenAIPreprocessor::builder` to extract and set this field from nvext; expanded `NvExt` structure with optional `request_received_seconds` field and updated allowed extra_fields to include "timing_metrics"; updated tests.
Delta response builders `lib/llm/src/protocols/openai/chat_completions/delta.rs`, `lib/llm/src/protocols/openai/completions/delta.rs`	Refactored disaggregated_params extraction to collect both worker_id and timing_metrics; inject both fields into nvext as a map only when present; added unit tests validating timing_metrics propagation through delta processing.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~28 minutes

Areas requiring extra attention:
- Timing metric propagation logic across decode/prefill/generate paths in handlers.py—verify correct initialization and merging across worker boundaries
- Streaming response timing injection in openai.rs—confirm final chunk detection and timestamp accuracy
- Field mapping consistency between nvext, preprocessor, and protocol structures—ensure no field loss during transformations
- Delta/nvext builders—validate conditional logic for when fields are present/absent

Poem

🐰 Hop, hop, the clocks now tick,
Each request leaves a timestamp's quick!
From start to end, the metrics flow,
Through handlers, streams, and responses' glow.
Timing's tale is finally sung—
Infrastructure's golden thread now strung! 🕐✨

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main feature: adding timing metrics reporting via nvext for the vLLM backend, which aligns with the primary objective of the changeset.
Description check	✅ Passed	The description follows the template with all required sections completed: Overview explains end-to-end timing metrics; Details covers Added/Changed/Removed; Where should the reviewer start provides specific file guidance; and Related Issues references Dynamo requirements.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

components/src/dynamo/vllm/handlers.py (1)
1-1: Fix Black formatting before merge.

The pipeline failure indicates that Black formatting failed. Please run black on this file to resolve the formatting issue.

Based on the pipeline failure log, run:
black components/src/dynamo/vllm/handlers.py

🧹 Nitpick comments (1)

lib/llm/src/http/service/openai.rs (1)

59-90: Consider returning a success indicator.

The function silently handles errors (line 64 .ok()) and structural mismatches (lines 69-88 nested as_object_mut()). While timing metrics are non-critical and silent failures may be acceptable, consider returning a bool to indicate success for better observability.

If you want to track silent failures, you could refactor to:

-fn inject_request_completed_seconds(nvext: &mut Option<serde_json::Value>) {
+fn inject_request_completed_seconds(nvext: &mut Option<serde_json::Value>) -> bool {
     let ts = SystemTime::now()
         .duration_since(UNIX_EPOCH)
         .map(|d| d.as_secs_f64())
         .ok();

-    if let Some(ts) = ts {
+    let Some(ts) = ts else {
+        tracing::trace!("Failed to get current timestamp for request_completed_seconds");
+        return false;
+    };
+    
         let nvext = nvext.get_or_insert_with(|| serde_json::Value::Object(serde_json::Map::new()));
         if let Some(obj) = nvext.as_object_mut() {
             if let Some(timing) = obj.get_mut("timing_metrics") {
                 if let Some(timing_obj) = timing.as_object_mut() {
                     timing_obj.insert(
                         "request_completed_seconds".to_string(),
                         serde_json::Value::from(ts),
                     );
+                    return true;
                 }
             } else {
                 let mut timing_obj = serde_json::Map::new();
                 timing_obj.insert(
                     "request_completed_seconds".to_string(),
                     serde_json::Value::from(ts),
                 );
                 obj.insert(
                     "timing_metrics".to_string(),
                     serde_json::Value::Object(timing_obj),
                 );
+                return true;
             }
         }
-    }
+    false
 }

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9d7d95 and 08d432b.

📒 Files selected for processing (7)

components/src/dynamo/vllm/handlers.py (8 hunks)
lib/llm/src/http/service/openai.rs (6 hunks)
lib/llm/src/preprocessor.rs (1 hunks)
lib/llm/src/protocols/common/preprocessor.rs (1 hunks)
lib/llm/src/protocols/openai/chat_completions/delta.rs (2 hunks)
lib/llm/src/protocols/openai/completions/delta.rs (2 hunks)
lib/llm/src/protocols/openai/nvext.rs (4 hunks)

🧰 Additional context used

🧠 Learnings (6)

📚 Learning: 2025-09-02T16:46:54.015Z

Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 2714
File: lib/llm/src/discovery/model_entry.rs:38-42
Timestamp: 2025-09-02T16:46:54.015Z
Learning: In lib/llm/src/discovery/model_entry.rs, GuanLuo prefers not to add serde defaults for model_type and model_input fields to keep the specification explicit and avoid user errors, relying on atomic deployment strategy to avoid backward compatibility issues.

Applied to files:

lib/llm/src/protocols/openai/nvext.rs

📚 Learning: 2025-09-08T21:18:43.478Z

Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2936
File: lib/parsers/src/reasoning/granite_parser.rs:42-46
Timestamp: 2025-09-08T21:18:43.478Z
Learning: In GraniteReasoningParser in lib/parsers/src/reasoning/granite_parser.rs, the think_start_tokens and think_end_tokens are hardcoded in the constructor with fixed values, so unwrap() calls on these vectors are safe and won't panic.

Applied to files:

lib/llm/src/protocols/openai/nvext.rs

📚 Learning: 2025-06-24T20:59:35.725Z

Learnt from: ishandhanani
Repo: ai-dynamo/dynamo PR: 1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is placed in the common section after the match statement on `request.prompt_input_type()`, meaning it applies to both `PromptInput::Tokens` and `PromptInput::Text` request types.

Applied to files:

lib/llm/src/protocols/common/preprocessor.rs

📚 Learning: 2025-08-22T19:55:41.608Z

Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: There are two separate DeltaGenerator classes in the codebase: one for chat completions (lib/llm/src/protocols/openai/chat_completions/delta.rs with object "chat.completion.chunk") and one for text completions (lib/llm/src/protocols/openai/completions/delta.rs with object "text_completion"). They have different create_choice method signatures and serve different OpenAI API endpoints. The reasoning parsing functionality is only relevant to the chat completions DeltaGenerator.

Applied to files:

lib/llm/src/protocols/openai/chat_completions/delta.rs
lib/llm/src/protocols/openai/completions/delta.rs

📚 Learning: 2025-08-22T19:55:41.608Z

Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: The create_choice method exists on multiple different objects in the codebase. The DeltaGenerator::create_choice in lib/llm/src/protocols/openai/chat_completions/delta.rs has its own signature that was updated to include reasoning_content, but other objects in lib/llm/src/engines.rs have their own separate create_choice methods with different signatures that are not related to chat completions.

Applied to files:

lib/llm/src/protocols/openai/chat_completions/delta.rs
lib/llm/src/protocols/openai/completions/delta.rs

📚 Learning: 2025-08-25T22:04:45.205Z

Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2700
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:19-28
Timestamp: 2025-08-25T22:04:45.205Z
Learning: The response_generator() method exists on multiple request types in the codebase: NvCreateChatCompletionRequest (for chat completions) and NvCreateCompletionRequest (for text completions). When making signature changes, it's important to distinguish between these different object types as they have separate implementations and call sites.

Applied to files:

lib/llm/src/http/service/openai.rs

🧬 Code graph analysis (6)

lib/llm/src/preprocessor.rs (4)

lib/llm/src/protocols/openai/nvext.rs (2)

nvext (9-9)

builder (99-101)

lib/llm/src/protocols/openai/responses.rs (3)

nvext (44-46)

nvext (104-106)

nvext (151-153)

lib/llm/src/protocols/openai/chat_completions.rs (3)

nvext (74-76)

nvext (134-136)

nvext (238-240)

lib/llm/src/protocols/openai.rs (2)

nvext (50-50)

nvext (60-60)

lib/llm/src/protocols/openai/nvext.rs (1)

lib/llm/src/preprocessor.rs (1)

builder (202-248)

lib/llm/src/protocols/common/preprocessor.rs (1)

lib/llm/src/preprocessor.rs (1)

builder (202-248)

lib/llm/src/protocols/openai/chat_completions/delta.rs (2)

lib/llm/src/protocols/openai/chat_completions/aggregator.rs (2)

new (92-104)

default (63-65)

lib/llm/src/protocols/openai/nvext.rs (2)

default (93-95)

nvext (9-9)

lib/llm/src/protocols/openai/completions/delta.rs (2)

lib/llm/src/protocols/openai/nvext.rs (2)

default (93-95)

nvext (9-9)

lib/llm/src/protocols/openai/responses.rs (3)

nvext (44-46)

nvext (104-106)

nvext (151-153)

components/src/dynamo/vllm/handlers.py (1)

components/src/dynamo/sglang/request_handlers/handler_base.py (1)

BaseWorkerHandler (20-222)

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/4661/merge) by zhongxuanwang-nv.

components/src/dynamo/vllm/handlers.py

[error] 1-1: Black formatting failed. 1 file reformatted by the hook; please re-run the commit or CI to pass: 'pre-commit run --show-diff-on-failure --color=always --all-files'.

🔇 Additional comments (19)

lib/llm/src/preprocessor.rs (1)

240-244: LGTM!

The extraction of request_received_seconds from nvext follows the established pattern for other nvext fields and maintains consistency with the timing metrics feature.

lib/llm/src/protocols/openai/nvext.rs (3)

84-89: LGTM!

The new request_received_seconds field is properly integrated with appropriate serde attributes and clear documentation.

79-79: LGTM!

The documentation correctly lists the expanded set of supported extra_fields.

136-160: LGTM!

Tests properly validate the new field and the expanded extra_fields list.

lib/llm/src/protocols/common/preprocessor.rs (1)

105-109: LGTM!

The request_received_seconds field is well-documented and follows the established patterns for optional fields in the struct.

components/src/dynamo/vllm/handlers.py (5)

77-87: LGTM!

The helper functions provide clean abstractions for timing metrics detection and timestamp retrieval.

313-393: LGTM!

The timing metrics initialization and decode tracking logic is sound:

Frontend timestamp is preserved when merging with prefill timing (lines 349-352)

Aggregated mode (no prefill_result) correctly treats decode_start as prefill_start (line 380)

First token time is used as prefill_end in aggregated mode (line 392), which aligns with the semantic that prefill completes when the first token is generated

402-413: LGTM!

The conditional injection of timing_metrics is correct:

Only when finish_reason is present (the final chunk)

Only when timing is requested

Safely initializes disaggregated_params if needed

267-271: LGTM!

The refactored completion_usage construction is cleaner and preserves the existing structure while preparing for timing metrics augmentation.

447-524: LGTM!

The prefill worker timing metrics logic correctly:

Captures prefill_start_seconds at the start of processing (line 460)

Records prefill_end_seconds when the result is available (lines 521-523)

Builds disaggregated_params with both kv_transfer_params and timing_metrics (lines 515-524)

lib/llm/src/protocols/openai/completions/delta.rs (2)

268-286: LGTM!

The generic extraction of worker_id and timing_metrics from disaggregated_params into nvext is clean and extensible. The conditional check ensures nvext is only set when there's actual data to include.

304-350: LGTM!

The test thoroughly validates that both timing_metrics and worker_id are correctly extracted from disaggregated_params and placed into nvext.

lib/llm/src/protocols/openai/chat_completions/delta.rs (2)

365-383: LGTM!

The nvext construction from disaggregated_params follows the same clean pattern as the completions path, maintaining consistency across the codebase.

464-506: LGTM!

The test provides thorough coverage of timing_metrics and worker_id extraction into nvext.

lib/llm/src/http/service/openai.rs (5)

317-329: LGTM!

Capturing request_received_seconds immediately at handler entry is the right approach for accurate timing. The safe initialization with get_or_insert_with(Default::default) is appropriate.

453-468: LGTM!

The injection on the final content chunk (with finish_reason) aligns with the backend's timing_metrics population strategy, avoiding injection on the separate usage-only chunk.

751-763: LGTM!

Consistent timing capture across both completions and chat completions endpoints.

898-912: LGTM!

The stream mapping for chat completions follows the same correct pattern as completions.

2149-2184: LGTM!

The tests validate the two key scenarios:

Preserving existing fields when injecting into existing timing_metrics

Creating the structure from scratch when nvext is None

Both scenarios are well-covered.

Signed-off-by: Zhongxuan Wang <daniewang@nvidia.com>

zhongxuanwang-nv · 2025-12-01T19:53:13Z

/ok to test 8588689

Signed-off-by: Zhongxuan Wang <daniewang@nvidia.com>

zhongxuanwang-nv · 2025-12-01T21:29:09Z

/ok to test 0f1d019

components/src/dynamo/vllm/handlers.py

lib/llm/src/http/service/openai.rs

grahamking · 2025-12-01T22:01:33Z

lib/llm/src/protocols/openai/chat_completions/delta.rs

-
-            let nvext_response = NvExtResponse {
-                worker_id: Some(worker_id_info),
-            };


Is worker_id gone? I think AI Gateway uses it.

It's definitely not gone, i just removed the NvExtResponse and WorkerIdInfo struct and instead just export the worker_id json directly because I figured it would look less redundant :)

Do you think i should bring these two struct back, even though no Dynamo code is referencing them?

Thank you Graham!

components/src/dynamo/vllm/handlers.py

ajcasagrande · 2025-12-01T23:16:25Z

components/src/dynamo/vllm/handlers.py

+                    if include_timing and timing_metrics:
+                        timing_metrics[
+                            "prefill_end_seconds"
+                        ] = _get_current_time_seconds()
+                        disaggregated_params["timing_metrics"] = timing_metrics


What happens if the prefill and decode nodes are on different machines and have slightly different clocks due to time drift? Will the final values be wrong or not synchronized?

That's true.. This is unfortunately not handled, so the final values would be wrong; I think we can instead make a note that asks the users to have all the clocks synchronized, if they choose to use the timing?

But also the good thing is, we can see the worker_id that handled that request, so we can know how much time drift the worker had :) Thank you Anthony!

components/src/dynamo/vllm/handlers.py

Signed-off-by: Zhongxuan Wang <daniewang@nvidia.com>

pull-request-size bot added the size/L label Dec 1, 2025

github-actions bot added the external-contribution Pull request is from an external contributor label Dec 1, 2025

Initial push

08d432b

Signed-off-by: Zhongxuan Wang <daniewang@nvidia.com>

zhongxuanwang-nv force-pushed the timing_zxw branch from 9f7ee4e to 08d432b Compare December 1, 2025 03:45

zhongxuanwang-nv changed the title ~~Initial push~~ Optional timing metrics reporting through nvext for vLLM backend Dec 1, 2025

zhongxuanwang-nv added backend::vllm Relates to the vllm backend and removed external-contribution Pull request is from an external contributor labels Dec 1, 2025

zhongxuanwang-nv changed the title ~~Optional timing metrics reporting through nvext for vLLM backend~~ feat: timing metrics reporting through nvext for vLLM backend Dec 1, 2025

github-actions bot added the feat label Dec 1, 2025

coderabbitai bot reviewed Dec 1, 2025

View reviewed changes

Fixes

ded8aa8

Signed-off-by: Zhongxuan Wang <daniewang@nvidia.com>

pull-request-size bot added size/XL and removed size/L labels Dec 1, 2025

Small issue fixes

8588689

Signed-off-by: Zhongxuan Wang <daniewang@nvidia.com>

copy-pr-bot bot had a problem deploying to GITLAB December 1, 2025 19:53 Failure

zhongxuanwang-nv added 3 commits December 1, 2025 20:47

black formatting

75ea426

Signed-off-by: Zhongxuan Wang <daniewang@nvidia.com>

Rust formats

2166fa7

Signed-off-by: Zhongxuan Wang <daniewang@nvidia.com>

Merge branch 'main' into timing_zxw

2dab081

zhongxuanwang-nv marked this pull request as ready for review December 1, 2025 20:49

zhongxuanwang-nv requested review from a team as code owners December 1, 2025 20:49

zhongxuanwang-nv added 2 commits December 1, 2025 20:54

Rust formatting

58fdc64

Signed-off-by: Zhongxuan Wang <daniewang@nvidia.com>

Fix tests

0f1d019

Signed-off-by: Zhongxuan Wang <daniewang@nvidia.com>

copy-pr-bot bot had a problem deploying to GITLAB December 1, 2025 21:29 Failure

grahamking reviewed Dec 1, 2025

View reviewed changes

components/src/dynamo/vllm/handlers.py Outdated Show resolved Hide resolved

grahamking reviewed Dec 1, 2025

View reviewed changes

lib/llm/src/http/service/openai.rs Outdated Show resolved Hide resolved

grahamking reviewed Dec 1, 2025

View reviewed changes

ajcasagrande reviewed Dec 1, 2025

View reviewed changes

Fix

4f437c1

Signed-off-by: Zhongxuan Wang <daniewang@nvidia.com>

feat: timing metrics reporting through nvext for vLLM backend #4661

Are you sure you want to change the base?

feat: timing metrics reporting through nvext for vLLM backend #4661

Uh oh!

Conversation

zhongxuanwang-nv commented Dec 1, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Dec 1, 2025

Uh oh!

github-actions bot commented Dec 1, 2025

Uh oh!

zhongxuanwang-nv commented Dec 1, 2025

Uh oh!

coderabbitai bot commented Dec 1, 2025

Uh oh!

coderabbitai bot commented Dec 1, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

zhongxuanwang-nv commented Dec 1, 2025

Uh oh!

zhongxuanwang-nv commented Dec 1, 2025

Uh oh!

Uh oh!

Uh oh!

grahamking Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhongxuanwang-nv Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ajcasagrande Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

zhongxuanwang-nv Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: timing metrics reporting through `nvext` for vLLM backend #4661

feat: timing metrics reporting through `nvext` for vLLM backend #4661

zhongxuanwang-nv commented Dec 1, 2025 •

edited by coderabbitai bot

Loading

grahamking Dec 1, 2025 •

edited

Loading

zhongxuanwang-nv Dec 1, 2025 •

edited

Loading