You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Deflake
`test_autoscaling_policy_with_metr_disab.py::TestAutoscalingMetrics::test_basic`
When `RAY_SERVE_COLLECT_AUTOSCALING_METRICS_ON_HANDLE=0`, we collect
ongoing request metrics at the replica and queued request metrics at the
handle -- but ongoing request metrics are updated very fast while queued
metrics are sent every 10s. Because of this delay the total number of
ongoing requests climbs to almost 100 because before the queued request
metrics are flushed, almost every request is double counted.
Example:
https://buildkite.com/ray-project/postmerge/builds/11322#0197eaca-62e1-457d-947b-a981210e98b9/177-852
Note that we are sending exactly 50 requests and expect the number of
replicas to scale to exactly 5. However the metrics grow above 50 here,
almost to 100, which causes the test to be flaky / fail.
This pr sets the env var `RAY_SERVE_HANDLE_METRIC_PUSH_INTERVAL_S=0.1`
and pairs with other stabilizing changes.
Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
0 commit comments