Skip to content

Conversation

@abrarsheikh
Copy link
Contributor

@abrarsheikh abrarsheikh commented Oct 24, 2025

I suspect that when we deploy the app config, we dont wait long enough before sending traffic, so requests could go to the wrong version

Signed-off-by: abrar <abrar@anyscale.com>
@abrarsheikh abrarsheikh requested a review from a team as a code owner October 24, 2025 16:12
@abrarsheikh abrarsheikh added the go add ONLY when ready to merge, run all tests label Oct 24, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves test stability by adding checks to wait for deployments to become healthy before proceeding with the tests. This is a good practice to avoid flakiness. My review includes suggestions to consolidate the health checks for multiple deployments into a single, more efficient call to improve code readability and performance.

Comment on lines 1716 to 1721
wait_for_condition(
check_deployment_status, name="A", expected_status=DeploymentStatus.HEALTHY
)
wait_for_condition(
check_deployment_status, name="B", expected_status=DeploymentStatus.HEALTHY
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For improved readability and efficiency, you can combine the health checks for multiple deployments into a single wait_for_condition call. This avoids the overhead of multiple wait_for_condition calls and can be more efficient by fetching the status of all deployments at once.

A helper function could be defined for this. For example:

def check_deployments_healthy(names: List[str], app_name: str = SERVE_DEFAULT_APP_NAME):
    statuses = serve.status().applications[app_name].deployments
    return all(
        statuses.get(name) and statuses[name].status == DeploymentStatus.HEALTHY
        for name in names
    )

And then call it like this:

wait_for_condition(check_deployments_healthy, names=["A", "B"])

This helper could be defined locally within the test method or, if it's generally useful, in ray.serve._private.test_utils.

Comment on lines 1806 to 1811
wait_for_condition(
check_deployment_status, name="A", expected_status=DeploymentStatus.HEALTHY
)
wait_for_condition(
check_deployment_status, name="B", expected_status=DeploymentStatus.HEALTHY
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to my other comment, these two wait_for_condition calls can be consolidated into a single call for better readability and efficiency. This can be done using a helper function that checks the status of multiple deployments at once.

@akyang-anyscale
Copy link
Contributor

I suspect that when we deploy the app config, we dont wait long enough before sending traffic, so requests could go to the wrong version

isn't check_running supposed to do that waiting for us?

@abrarsheikh
Copy link
Contributor Author

I suspect that when we deploy the app config, we dont wait long enough before sending traffic, so requests could go to the wrong version

isn't check_running supposed to do that waiting for us?

is app running imply deployment == healthy ? I am 90% sure that is true.

but also check_running was missing from test_autoscaling_policy_enable_disable

@akyang-anyscale
Copy link
Contributor

is app running imply deployment == healthy ? I am 90% sure that is true

so is it redundant then?

@ray-gardener ray-gardener bot added the serve Ray Serve Related Issue label Oct 24, 2025
Signed-off-by: abrar <abrar@anyscale.com>
@abrarsheikh abrarsheikh merged commit 226a414 into master Oct 24, 2025
6 checks passed
@abrarsheikh abrarsheikh deleted the SERVE-1274-abrar-flaky branch October 24, 2025 22:01
xinyuangui2 pushed a commit to xinyuangui2/ray that referenced this pull request Oct 27, 2025
I suspect that when we deploy the app config, we dont wait long enough
before sending traffic, so requests could go to the wrong version

---------

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: xgui <xgui@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
I suspect that when we deploy the app config, we dont wait long enough
before sending traffic, so requests could go to the wrong version

---------

Signed-off-by: abrar <abrar@anyscale.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
I suspect that when we deploy the app config, we dont wait long enough
before sending traffic, so requests could go to the wrong version

---------

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
I suspect that when we deploy the app config, we dont wait long enough
before sending traffic, so requests could go to the wrong version

---------

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants