[Serve][1/n] Add autoscaling prometheus metrics #59220

abrarsheikh · 2025-12-06T06:05:39Z

https://anyscale-ray--59220.com.readthedocs.build/en/59220/serve/monitoring.html#built-in-ray-serve-metrics

docs changes

refactored the table with all metrics, IMO markdown is easier to read in code
split the table of metrics in ordered categories. categories are ordered by typical request path
included a stick diagram of important metrics, show where in the request lifecycle the metric is recorded
order metrics in table by order in request path

Adding the following new metrics

    - ray_serve_deployment_target_replicas: Target number of replicas
        Tags: deployment, application
    - ray_serve_autoscaling_decision_replicas: Raw decision before bounds
        Tags: deployment, application
    - ray_serve_autoscaling_total_requests: Total requests seen by autoscaler
        Tags: deployment, application
    - ray_serve_autoscaling_policy_execution_time_ms: Policy execution time
        Tags: deployment, application, policy_scope
    - ray_serve_autoscaling_replica_metrics_delay_ms: Replica metrics delay
        Tags: deployment, application, replica
    - ray_serve_autoscaling_handle_metrics_delay_ms: Handle metrics delay
        Tags: deployment, application, handle

Signed-off-by: abrar <abrar@anyscale.com>

gemini-code-assist

Code Review

This pull request introduces several new Prometheus metrics to improve the observability of Ray Serve's autoscaling functionality. The new metrics cover autoscaling decisions, policy execution time, and metric reporting delays, which will be very helpful for debugging and monitoring. The implementation looks solid, and it's great to see comprehensive tests added for the new metrics. I have one suggestion to refactor a small piece of duplicated code for better maintainability.

python/ray/serve/_private/autoscaling_state.py

Signed-off-by: abrar <abrar@anyscale.com>

python/ray/serve/_private/replica.py

python/ray/serve/_private/deployment_state.py

Signed-off-by: abrar <abrar@anyscale.com>

…autoscale

python/ray/serve/_private/autoscaling_state.py

Signed-off-by: abrar <abrar@anyscale.com>

akshay-anyscale · 2025-12-11T05:44:17Z

doc/source/serve/monitoring.md

+  │   ┌─────────────────────────────────────────────────────────────────────┐   │
+  │   │                          REPLICA                                    │   │
+  │   │                                                                     │   │
+  │   │  ○ ray_serve_replica_processing_queries     (while processing)      │   │


would be good to rename this (backwards compatible way) to ray_serve_replica_num_ongoing_requests to align with terminology of autoscaling and other such metrics

akshay-anyscale · 2025-12-11T05:48:57Z

doc/source/serve/monitoring.md

+|--------|------|------|-------------|
+| `ray_serve_handle_request_counter_total` **[D]** | Counter | `handle`, `deployment`, `route`, `application` | Total number of requests processed by this `DeploymentHandle`. |
+| `ray_serve_num_router_requests_total` **[H]** | Counter | `deployment`, `route`, `application`, `handle`, `actor_id` | Total number of requests routed to a deployment. |
+| `ray_serve_deployment_queued_queries` **[H]** | Gauge | `deployment`, `application`, `handle`, `actor_id` | Current number of requests waiting to be assigned to a replica. High values indicate backpressure. |


would be good to rename this too .. ray_serve_router_num_queued_requests

akshay-anyscale · 2025-12-11T05:57:31Z

doc/source/serve/monitoring.md

+| `ray_serve_handle_request_counter_total` **[D]** | Counter | `handle`, `deployment`, `route`, `application` | Total number of requests processed by this `DeploymentHandle`. |
+| `ray_serve_num_router_requests_total` **[H]** | Counter | `deployment`, `route`, `application`, `handle`, `actor_id` | Total number of requests routed to a deployment. |
+| `ray_serve_deployment_queued_queries` **[H]** | Gauge | `deployment`, `application`, `handle`, `actor_id` | Current number of requests waiting to be assigned to a replica. High values indicate backpressure. |
+| `ray_serve_num_ongoing_requests_at_replicas` **[H]** | Gauge | `deployment`, `application`, `handle`, `actor_id` | Current number of requests assigned and sent to replicas but not yet completed. |


I think this metric name should explicitly say handle or router, or it'll be confusing.

abrarsheikh · 2025-12-11T06:55:52Z

@akshay-anyscale filed #59376

https://anyscale-ray--59220.com.readthedocs.build/en/59220/serve/monitoring.html#built-in-ray-serve-metrics fixes ray-project#59218 docs changes - [x] refactored the table with all metrics, IMO markdown is easier to read in code - [x] split the table of metrics in ordered categories. categories are ordered by typical request path - [x] included a stick diagram of important metrics, show where in the request lifecycle the metric is recorded - [x] order metrics in table by order in request path Adding the following new metrics ``` - ray_serve_deployment_target_replicas: Target number of replicas Tags: deployment, application - ray_serve_autoscaling_decision_replicas: Raw decision before bounds Tags: deployment, application - ray_serve_autoscaling_total_requests: Total requests seen by autoscaler Tags: deployment, application - ray_serve_autoscaling_policy_execution_time_ms: Policy execution time Tags: deployment, application, policy_scope - ray_serve_autoscaling_replica_metrics_delay_ms: Replica metrics delay Tags: deployment, application, replica - ray_serve_autoscaling_handle_metrics_delay_ms: Handle metrics delay Tags: deployment, application, handle ``` --------- Signed-off-by: abrar <abrar@anyscale.com>

[1/n] Add autoscaling prometheus metrics

73aa8da

Signed-off-by: abrar <abrar@anyscale.com>

gemini-code-assist bot reviewed Dec 6, 2025

View reviewed changes

python/ray/serve/_private/autoscaling_state.py Outdated Show resolved Hide resolved

abrarsheikh added 2 commits December 6, 2025 06:51

update monitoring doc

cf31963

Signed-off-by: abrar <abrar@anyscale.com>

refactor

eea57ff

Signed-off-by: abrar <abrar@anyscale.com>

abrarsheikh added the go add ONLY when ready to merge, run all tests label Dec 6, 2025

abrarsheikh added 2 commits December 6, 2025 07:10

sort metrics by order

e50c250

Signed-off-by: abrar <abrar@anyscale.com>

fix replica autoscacling metrics

f2f828c

Signed-off-by: abrar <abrar@anyscale.com>

abrarsheikh changed the title ~~[1/n] Add autoscaling prometheus metrics~~ [Serve][1/n] Add autoscaling prometheus metrics Dec 7, 2025

abrarsheikh marked this pull request as ready for review December 8, 2025 17:55

abrarsheikh requested review from a team as code owners December 8, 2025 17:55

abrarsheikh requested review from akshay-anyscale and harshit-anyscale December 8, 2025 17:55

ray-gardener bot added serve Ray Serve Related Issue docs An issue or change related to documentation observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling labels Dec 8, 2025

improve docs

6fafd6c

Signed-off-by: abrar <abrar@anyscale.com>

cursor bot reviewed Dec 10, 2025

View reviewed changes

python/ray/serve/_private/replica.py Show resolved Hide resolved

harshit-anyscale approved these changes Dec 10, 2025

View reviewed changes

python/ray/serve/_private/deployment_state.py Show resolved Hide resolved

abrarsheikh added 2 commits December 11, 2025 01:36

rename metrics

60d6812

Signed-off-by: abrar <abrar@anyscale.com>

Merge branch 'master' of github.com:ray-project/ray into 59218-abrar-…

e75cd5e

…autoscale

cursor bot reviewed Dec 11, 2025

View reviewed changes

python/ray/serve/_private/autoscaling_state.py Show resolved Hide resolved

python/ray/serve/_private/autoscaling_state.py Show resolved Hide resolved

remove multiple calls

74c68d8

Signed-off-by: abrar <abrar@anyscale.com>

akshay-anyscale reviewed Dec 11, 2025

View reviewed changes

abrarsheikh mentioned this pull request Dec 11, 2025

[Serve] rename some prom metrics #59376

Open

akshay-anyscale approved these changes Dec 12, 2025

View reviewed changes

abrarsheikh merged commit 04b2998 into master Dec 12, 2025
6 checks passed

abrarsheikh deleted the 59218-abrar-autoscale branch December 12, 2025 23:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Serve][1/n] Add autoscaling prometheus metrics #59220

[Serve][1/n] Add autoscaling prometheus metrics #59220

Uh oh!

abrarsheikh commented Dec 6, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

akshay-anyscale Dec 11, 2025

Uh oh!

akshay-anyscale Dec 11, 2025

Uh oh!

akshay-anyscale Dec 11, 2025

Uh oh!

abrarsheikh commented Dec 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Serve][1/n] Add autoscaling prometheus metrics #59220

[Serve][1/n] Add autoscaling prometheus metrics #59220

Uh oh!

Conversation

abrarsheikh commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

akshay-anyscale Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

akshay-anyscale Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

akshay-anyscale Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

abrarsheikh commented Dec 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

abrarsheikh commented Dec 6, 2025 •

edited

Loading