Skip to content

Conversation

@abrarsheikh
Copy link
Contributor

@abrarsheikh abrarsheikh commented Dec 6, 2025

https://anyscale-ray--59220.com.readthedocs.build/en/59220/serve/monitoring.html#built-in-ray-serve-metrics

fixes #59218

docs changes

  • refactored the table with all metrics, IMO markdown is easier to read in code
  • split the table of metrics in ordered categories. categories are ordered by typical request path
  • included a stick diagram of important metrics, show where in the request lifecycle the metric is recorded
  • order metrics in table by order in request path

Adding the following new metrics

    - ray_serve_deployment_target_replicas: Target number of replicas
        Tags: deployment, application
    - ray_serve_autoscaling_decision_replicas: Raw decision before bounds
        Tags: deployment, application
    - ray_serve_autoscaling_total_requests: Total requests seen by autoscaler
        Tags: deployment, application
    - ray_serve_autoscaling_policy_execution_time_ms: Policy execution time
        Tags: deployment, application, policy_scope
    - ray_serve_autoscaling_replica_metrics_delay_ms: Replica metrics delay
        Tags: deployment, application, replica
    - ray_serve_autoscaling_handle_metrics_delay_ms: Handle metrics delay
        Tags: deployment, application, handle

Signed-off-by: abrar <abrar@anyscale.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several new Prometheus metrics to improve the observability of Ray Serve's autoscaling functionality. The new metrics cover autoscaling decisions, policy execution time, and metric reporting delays, which will be very helpful for debugging and monitoring. The implementation looks solid, and it's great to see comprehensive tests added for the new metrics. I have one suggestion to refactor a small piece of duplicated code for better maintainability.

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: abrar <abrar@anyscale.com>
@abrarsheikh abrarsheikh added the go add ONLY when ready to merge, run all tests label Dec 6, 2025
Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: abrar <abrar@anyscale.com>
@abrarsheikh abrarsheikh changed the title [1/n] Add autoscaling prometheus metrics [Serve][1/n] Add autoscaling prometheus metrics Dec 7, 2025
@abrarsheikh abrarsheikh marked this pull request as ready for review December 8, 2025 17:55
@abrarsheikh abrarsheikh requested review from a team as code owners December 8, 2025 17:55
@ray-gardener ray-gardener bot added serve Ray Serve Related Issue docs An issue or change related to documentation observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling labels Dec 8, 2025
Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: abrar <abrar@anyscale.com>
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ REPLICA │ │
│ │ │ │
│ │ ○ ray_serve_replica_processing_queries (while processing) │ │
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to rename this (backwards compatible way) to ray_serve_replica_num_ongoing_requests to align with terminology of autoscaling and other such metrics

|--------|------|------|-------------|
| `ray_serve_handle_request_counter_total` **[D]** | Counter | `handle`, `deployment`, `route`, `application` | Total number of requests processed by this `DeploymentHandle`. |
| `ray_serve_num_router_requests_total` **[H]** | Counter | `deployment`, `route`, `application`, `handle`, `actor_id` | Total number of requests routed to a deployment. |
| `ray_serve_deployment_queued_queries` **[H]** | Gauge | `deployment`, `application`, `handle`, `actor_id` | Current number of requests waiting to be assigned to a replica. High values indicate backpressure. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to rename this too .. ray_serve_router_num_queued_requests

| `ray_serve_handle_request_counter_total` **[D]** | Counter | `handle`, `deployment`, `route`, `application` | Total number of requests processed by this `DeploymentHandle`. |
| `ray_serve_num_router_requests_total` **[H]** | Counter | `deployment`, `route`, `application`, `handle`, `actor_id` | Total number of requests routed to a deployment. |
| `ray_serve_deployment_queued_queries` **[H]** | Gauge | `deployment`, `application`, `handle`, `actor_id` | Current number of requests waiting to be assigned to a replica. High values indicate backpressure. |
| `ray_serve_num_ongoing_requests_at_replicas` **[H]** | Gauge | `deployment`, `application`, `handle`, `actor_id` | Current number of requests assigned and sent to replicas but not yet completed. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this metric name should explicitly say handle or router, or it'll be confusing.

@abrarsheikh
Copy link
Contributor Author

@akshay-anyscale filed #59376

@abrarsheikh abrarsheikh merged commit 04b2998 into master Dec 12, 2025
6 checks passed
@abrarsheikh abrarsheikh deleted the 59218-abrar-autoscale branch December 12, 2025 23:46
Yicheng-Lu-llll pushed a commit to Yicheng-Lu-llll/ray that referenced this pull request Dec 22, 2025
https://anyscale-ray--59220.com.readthedocs.build/en/59220/serve/monitoring.html#built-in-ray-serve-metrics

fixes ray-project#59218

docs changes
- [x] refactored the table with all metrics, IMO markdown is easier to
read in code
- [x] split the table of metrics in ordered categories. categories are
ordered by typical request path
- [x] included a stick diagram of important metrics, show where in the
request lifecycle the metric is recorded
- [x] order metrics in table by order in request path

Adding the following new metrics
```
    - ray_serve_deployment_target_replicas: Target number of replicas
        Tags: deployment, application
    - ray_serve_autoscaling_decision_replicas: Raw decision before bounds
        Tags: deployment, application
    - ray_serve_autoscaling_total_requests: Total requests seen by autoscaler
        Tags: deployment, application
    - ray_serve_autoscaling_policy_execution_time_ms: Policy execution time
        Tags: deployment, application, policy_scope
    - ray_serve_autoscaling_replica_metrics_delay_ms: Replica metrics delay
        Tags: deployment, application, replica
    - ray_serve_autoscaling_handle_metrics_delay_ms: Handle metrics delay
        Tags: deployment, application, handle
```

---------

Signed-off-by: abrar <abrar@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs An issue or change related to documentation go add ONLY when ready to merge, run all tests observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Serve] add debugging metrics to ray serve

4 participants