Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core][serve] 1MB latency performance regression #46428

Closed
zcin opened this issue Jul 3, 2024 · 3 comments · Fixed by #47209
Closed

[core][serve] 1MB latency performance regression #46428

zcin opened this issue Jul 3, 2024 · 3 comments · Fixed by #47209
Assignees
Labels
bug Something that is supposed to be working; but isn't P0 Issues that should be fixed in short order serve Ray Serve Related Issue

Comments

@zcin
Copy link
Contributor

zcin commented Jul 3, 2024

What happened + What you expected to happen

Around 6/14, latency for sending a request with 1MB payload through a serve DeploymentHandle increased from ~3.4s to ~4.6s.

Screenshot 2024-07-03 at 3 58 58 PM

From bisecting, d729815 seems to be the offending commit.

Versions / Dependencies

n/a

Reproduction script

Run python release/serve_tests/workloads/microbenchmarks.py.

Issue Severity

None

@zcin zcin added bug Something that is supposed to be working; but isn't P0 Issues that should be fixed in short order serve Ray Serve Related Issue core Issues that should be addressed in Ray Core labels Jul 3, 2024
@anyscalesam
Copy link
Contributor

anyscalesam commented Jul 15, 2024

Talked to @edoakes > no leads... @kevin85421 is going to go into Ray Serve source ... this is interrupt important ... and we don't want to revert the DAG PR for this either... so no choice have to investigate.

Suspicious PR is #45699

@jjyao jjyao assigned jjyao and unassigned kevin85421 Jul 18, 2024
@jjyao
Copy link
Collaborator

jjyao commented Jul 25, 2024

Found that the slowness is from Router._resolve_deployment_responses which is basically pickle.dump and pickle.load. It's unclear how #45699 affects it since if we just run handle 1mb, it's the same before and after that PR. It's only slower if we run handle noop before it.

Instead of figuring out why pickle.dump is slower, we decided to remove the call of Router._resolve_deployment_responses` all together since it turns out to contribute a large portion of the latency.

@jjyao jjyao assigned zcin and unassigned jjyao Jul 25, 2024
@jjyao
Copy link
Collaborator

jjyao commented Jul 25, 2024

Assigning back to @zcin for tracking the removal of Router._resolve_deployment_responses

@jjyao jjyao removed the core Issues that should be addressed in Ray Core label Jul 25, 2024
@zcin zcin linked a pull request Aug 20, 2024 that will close this issue
8 tasks
@zcin zcin closed this as completed in d37b18d Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P0 Issues that should be fixed in short order serve Ray Serve Related Issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants