Skip to content

Conversation

@ganyi1996ppo
Copy link
Collaborator

What this PR does / why we need it?

This PR introduce a new proxy server implementation for disaggregated pd, which support simple token&kv_cache aware load balance routing strategy for the disaggregated pd system compared with origin toy_proxy with round robin.

Does this PR introduce any user-facing change?

No

How was this patch tested?

tested on real workload

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
@wangxiyuan wangxiyuan changed the title Implementation of simple load balance routing proxy server [0.9.1]Implementation of simple load balance routing proxy server Jul 29, 2025
@ganyi1996ppo ganyi1996ppo merged commit bd2f365 into vllm-project:v0.9.1-dev Jul 31, 2025
8 checks passed
ganyi1996ppo added a commit to ganyi1996ppo/vllm-ascend that referenced this pull request Jul 31, 2025
…ect#1953)

This PR introduce a new proxy server implementation for disaggregated
pd, which support simple token&kv_cache aware load balance routing
strategy for the disaggregated pd system compared with origin toy_proxy
with round robin.

No

tested on real workload

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
wangxiyuan pushed a commit that referenced this pull request Aug 4, 2025
…2124)

### What this PR does / why we need it?
The PR is the cherry-pick from v0.9.1
#1953

This PR introduce a new load balance proxy server example implementation
for disaggregated pd, which support simple token&kv_cache aware load
balance routing strategy for the disaggregated pd system compared with
origin round robin toy_proxy.

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
tested on real workload and unittest

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@ad57f23

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
raindaywhu added a commit to raindaywhu/vllm-ascend that referenced this pull request Aug 4, 2025
zzhx1 pushed a commit to lidenghui1110/vllm-ascend that referenced this pull request Aug 11, 2025
…ect#1953) (vllm-project#2124)

### What this PR does / why we need it?
The PR is the cherry-pick from v0.9.1
vllm-project#1953

This PR introduce a new load balance proxy server example implementation
for disaggregated pd, which support simple token&kv_cache aware load
balance routing strategy for the disaggregated pd system compared with
origin round robin toy_proxy.

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
tested on real workload and unittest

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@ad57f23

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
zzhx1 pushed a commit to lidenghui1110/vllm-ascend that referenced this pull request Aug 11, 2025
…ect#1953) (vllm-project#2124)

### What this PR does / why we need it?
The PR is the cherry-pick from v0.9.1
vllm-project#1953

This PR introduce a new load balance proxy server example implementation
for disaggregated pd, which support simple token&kv_cache aware load
balance routing strategy for the disaggregated pd system compared with
origin round robin toy_proxy.

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
tested on real workload and unittest

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@ad57f23

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
…ect#1953) (vllm-project#2124)

### What this PR does / why we need it?
The PR is the cherry-pick from v0.9.1
vllm-project#1953

This PR introduce a new load balance proxy server example implementation
for disaggregated pd, which support simple token&kv_cache aware load
balance routing strategy for the disaggregated pd system compared with
origin round robin toy_proxy.

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
tested on real workload and unittest

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@ad57f23

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…ect#1953) (vllm-project#2124)

### What this PR does / why we need it?
The PR is the cherry-pick from v0.9.1
vllm-project#1953

This PR introduce a new load balance proxy server example implementation
for disaggregated pd, which support simple token&kv_cache aware load
balance routing strategy for the disaggregated pd system compared with
origin round robin toy_proxy.

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
tested on real workload and unittest

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@ad57f23

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant