Skip to content

Conversation

@ganyi1996ppo
Copy link
Collaborator

@ganyi1996ppo ganyi1996ppo commented Jul 31, 2025

What this PR does / why we need it?

The PR is the cherry-pick from v0.9.1 #1953

This PR introduce a new load balance proxy server example implementation for disaggregated pd, which support simple token&kv_cache aware load balance routing strategy for the disaggregated pd system compared with origin round robin toy_proxy.

Does this PR introduce any user-facing change?

No

How was this patch tested?

tested on real workload and unittest

…ect#1953)

This PR introduce a new proxy server implementation for disaggregated
pd, which support simple token&kv_cache aware load balance routing
strategy for the disaggregated pd system compared with origin toy_proxy
with round robin.

No

tested on real workload

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@wangxiyuan
Copy link
Collaborator

does this example rely on DP patch?

@ganyi1996ppo
Copy link
Collaborator Author

does this example rely on DP patch?

No, this is an totally independant proxy .

@wangxiyuan
Copy link
Collaborator

OK, make sense for me.

# Adapted from https://github.com/vllm-project/vllm/tests/v1/kv_connector/nixl_integration/toy_proxy_server.py

# SPDX-License-Identifier: Apache-2.0

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's good to add some usage guide here to let users know how to run this exmaple in quick

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good advise, I'll add some example in the comments

@jianzs
Copy link
Collaborator

jianzs commented Jul 31, 2025

Do we need both proxy server implementations in the example folder? We could keep either toy_proxy_server or the one from this PR?

@ganyi1996ppo
Copy link
Collaborator Author

Do we need both proxy server implementations in the example folder? We could keep either toy_proxy_server or the one from this PR?

Sounds fair, I'll remove the old one

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
@wangxiyuan wangxiyuan merged commit 4b3a210 into vllm-project:main Aug 4, 2025
11 checks passed
raindaywhu added a commit to raindaywhu/vllm-ascend that referenced this pull request Aug 4, 2025
zzhx1 pushed a commit to lidenghui1110/vllm-ascend that referenced this pull request Aug 11, 2025
…ect#1953) (vllm-project#2124)

### What this PR does / why we need it?
The PR is the cherry-pick from v0.9.1
vllm-project#1953

This PR introduce a new load balance proxy server example implementation
for disaggregated pd, which support simple token&kv_cache aware load
balance routing strategy for the disaggregated pd system compared with
origin round robin toy_proxy.

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
tested on real workload and unittest

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@ad57f23

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
zzhx1 pushed a commit to lidenghui1110/vllm-ascend that referenced this pull request Aug 11, 2025
…ect#1953) (vllm-project#2124)

### What this PR does / why we need it?
The PR is the cherry-pick from v0.9.1
vllm-project#1953

This PR introduce a new load balance proxy server example implementation
for disaggregated pd, which support simple token&kv_cache aware load
balance routing strategy for the disaggregated pd system compared with
origin round robin toy_proxy.

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
tested on real workload and unittest

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@ad57f23

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
…ect#1953) (vllm-project#2124)

### What this PR does / why we need it?
The PR is the cherry-pick from v0.9.1
vllm-project#1953

This PR introduce a new load balance proxy server example implementation
for disaggregated pd, which support simple token&kv_cache aware load
balance routing strategy for the disaggregated pd system compared with
origin round robin toy_proxy.

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
tested on real workload and unittest

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@ad57f23

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…ect#1953) (vllm-project#2124)

### What this PR does / why we need it?
The PR is the cherry-pick from v0.9.1
vllm-project#1953

This PR introduce a new load balance proxy server example implementation
for disaggregated pd, which support simple token&kv_cache aware load
balance routing strategy for the disaggregated pd system compared with
origin round robin toy_proxy.

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
tested on real workload and unittest

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@ad57f23

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants