[Serve.LLM] Add avg prompt length metric (#58599)

ruisearch42 · gemini-code-assist[bot] · web-flow · commit 62a33c29d23a · 2025-11-13T13:33:33.000-08:00
## Description Add avg prompt length metric When using uniform prompt length (especially in testing), the P50 and P90 computations are skewed due to the 1_2_5 buckets used in vLLM. Average prompt length provides another useful dimension to look at and validate. For example, using uniformly ISL=5000, P50 shows 7200 and P90 shows 9400, and avg accurately shows 5000. <img width="1186" height="466" alt="image" src="https://github.com/user-attachments/assets/4615c3ca-2e15-4236-97f9-72bc63ef9d1a" /> ## Related issues ## Additional information --------- Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Signed-off-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
diff --git a/python/ray/dashboard/modules/metrics/dashboards/serve_llm_dashboard_panels.py b/python/ray/dashboard/modules/metrics/dashboards/serve_llm_dashboard_panels.py
@@ -223,6 +223,10 @@
                 expr='histogram_quantile(0.90, sum by(le, model_name, WorkerId) (rate(ray_vllm_request_prompt_tokens_bucket{{model_name=~"$vllm_model_name", WorkerId=~"$workerid", {global_filters}}}[$interval])))',
                 legend="P90-{{model_name}}-{{WorkerId}}",
             ),
+            Target(
+                expr='(sum by(model_name, WorkerId) (rate(ray_vllm_request_prompt_tokens_sum{{model_name=~"$vllm_model_name", WorkerId=~"$workerid", {global_filters}}}[$interval]))\n/\nsum by(model_name, WorkerId) (rate(ray_vllm_request_prompt_tokens_count{{model_name=~"$vllm_model_name", WorkerId=~"$workerid", {global_filters}}}[$interval])))',
+                legend="Average-{{model_name}}-{{WorkerId}}",
+            ),
         ],
         fill=1,
         linewidth=1,