-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[Serve][1/n] Add autoscaling prometheus metrics #59220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: abrar <abrar@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces several new Prometheus metrics to improve the observability of Ray Serve's autoscaling functionality. The new metrics cover autoscaling decisions, policy execution time, and metric reporting delays, which will be very helpful for debugging and monitoring. The implementation looks solid, and it's great to see comprehensive tests added for the new metrics. I have one suggestion to refactor a small piece of duplicated code for better maintainability.
Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: abrar <abrar@anyscale.com>
| │ ┌─────────────────────────────────────────────────────────────────────┐ │ | ||
| │ │ REPLICA │ │ | ||
| │ │ │ │ | ||
| │ │ ○ ray_serve_replica_processing_queries (while processing) │ │ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to rename this (backwards compatible way) to ray_serve_replica_num_ongoing_requests to align with terminology of autoscaling and other such metrics
| |--------|------|------|-------------| | ||
| | `ray_serve_handle_request_counter_total` **[D]** | Counter | `handle`, `deployment`, `route`, `application` | Total number of requests processed by this `DeploymentHandle`. | | ||
| | `ray_serve_num_router_requests_total` **[H]** | Counter | `deployment`, `route`, `application`, `handle`, `actor_id` | Total number of requests routed to a deployment. | | ||
| | `ray_serve_deployment_queued_queries` **[H]** | Gauge | `deployment`, `application`, `handle`, `actor_id` | Current number of requests waiting to be assigned to a replica. High values indicate backpressure. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to rename this too .. ray_serve_router_num_queued_requests
| | `ray_serve_handle_request_counter_total` **[D]** | Counter | `handle`, `deployment`, `route`, `application` | Total number of requests processed by this `DeploymentHandle`. | | ||
| | `ray_serve_num_router_requests_total` **[H]** | Counter | `deployment`, `route`, `application`, `handle`, `actor_id` | Total number of requests routed to a deployment. | | ||
| | `ray_serve_deployment_queued_queries` **[H]** | Gauge | `deployment`, `application`, `handle`, `actor_id` | Current number of requests waiting to be assigned to a replica. High values indicate backpressure. | | ||
| | `ray_serve_num_ongoing_requests_at_replicas` **[H]** | Gauge | `deployment`, `application`, `handle`, `actor_id` | Current number of requests assigned and sent to replicas but not yet completed. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this metric name should explicitly say handle or router, or it'll be confusing.
|
@akshay-anyscale filed #59376 |
https://anyscale-ray--59220.com.readthedocs.build/en/59220/serve/monitoring.html#built-in-ray-serve-metrics fixes ray-project#59218 docs changes - [x] refactored the table with all metrics, IMO markdown is easier to read in code - [x] split the table of metrics in ordered categories. categories are ordered by typical request path - [x] included a stick diagram of important metrics, show where in the request lifecycle the metric is recorded - [x] order metrics in table by order in request path Adding the following new metrics ``` - ray_serve_deployment_target_replicas: Target number of replicas Tags: deployment, application - ray_serve_autoscaling_decision_replicas: Raw decision before bounds Tags: deployment, application - ray_serve_autoscaling_total_requests: Total requests seen by autoscaler Tags: deployment, application - ray_serve_autoscaling_policy_execution_time_ms: Policy execution time Tags: deployment, application, policy_scope - ray_serve_autoscaling_replica_metrics_delay_ms: Replica metrics delay Tags: deployment, application, replica - ray_serve_autoscaling_handle_metrics_delay_ms: Handle metrics delay Tags: deployment, application, handle ``` --------- Signed-off-by: abrar <abrar@anyscale.com>
https://anyscale-ray--59220.com.readthedocs.build/en/59220/serve/monitoring.html#built-in-ray-serve-metrics
fixes #59218
docs changes
Adding the following new metrics