-
Notifications
You must be signed in to change notification settings - Fork 563
Implementation of simple load balance routing proxy server (#1953) #2124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of simple load balance routing proxy server (#1953) #2124
Conversation
…ect#1953) This PR introduce a new proxy server implementation for disaggregated pd, which support simple token&kv_cache aware load balance routing strategy for the disaggregated pd system compared with origin toy_proxy with round robin. No tested on real workload --------- Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
|
does this example rely on DP patch? |
No, this is an totally independant proxy . |
|
OK, make sense for me. |
| # Adapted from https://github.com/vllm-project/vllm/tests/v1/kv_connector/nixl_integration/toy_proxy_server.py | ||
|
|
||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's good to add some usage guide here to let users know how to run this exmaple in quick
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good advise, I'll add some example in the comments
|
Do we need both proxy server implementations in the example folder? We could keep either toy_proxy_server or the one from this PR? |
Sounds fair, I'll remove the old one |
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
…to eplb_into_main * 'main' of https://github.com/vllm-project/vllm-ascend: Implementation of simple load balance routing proxy server (vllm-project#1953) (vllm-project#2124)
…ect#1953) (vllm-project#2124) ### What this PR does / why we need it? The PR is the cherry-pick from v0.9.1 vllm-project#1953 This PR introduce a new load balance proxy server example implementation for disaggregated pd, which support simple token&kv_cache aware load balance routing strategy for the disaggregated pd system compared with origin round robin toy_proxy. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? tested on real workload and unittest - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@ad57f23 --------- Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
…ect#1953) (vllm-project#2124) ### What this PR does / why we need it? The PR is the cherry-pick from v0.9.1 vllm-project#1953 This PR introduce a new load balance proxy server example implementation for disaggregated pd, which support simple token&kv_cache aware load balance routing strategy for the disaggregated pd system compared with origin round robin toy_proxy. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? tested on real workload and unittest - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@ad57f23 --------- Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
…ect#1953) (vllm-project#2124) ### What this PR does / why we need it? The PR is the cherry-pick from v0.9.1 vllm-project#1953 This PR introduce a new load balance proxy server example implementation for disaggregated pd, which support simple token&kv_cache aware load balance routing strategy for the disaggregated pd system compared with origin round robin toy_proxy. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? tested on real workload and unittest - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@ad57f23 --------- Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
…ect#1953) (vllm-project#2124) ### What this PR does / why we need it? The PR is the cherry-pick from v0.9.1 vllm-project#1953 This PR introduce a new load balance proxy server example implementation for disaggregated pd, which support simple token&kv_cache aware load balance routing strategy for the disaggregated pd system compared with origin round robin toy_proxy. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? tested on real workload and unittest - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@ad57f23 --------- Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
What this PR does / why we need it?
The PR is the cherry-pick from v0.9.1 #1953
This PR introduce a new load balance proxy server example implementation for disaggregated pd, which support simple token&kv_cache aware load balance routing strategy for the disaggregated pd system compared with origin round robin toy_proxy.
Does this PR introduce any user-facing change?
No
How was this patch tested?
tested on real workload and unittest