Disaggregate prefill for kv cache register style #950

ganyi1996ppo · 2025-05-26T01:38:32Z

What this PR does / why we need it?

This PR adopt LLMDataDist for kv cache register and pull_blocks style disaggregate prefill implementation. The interface implementation mainly follows the design of NIXL PR https://github.com/vllm-project/vllm/pull/17751/files#diff-7eaad0b7dee0626bf29d10081b0f0c5e3ea15a4af97e7b182a4e0d35f8346953 .

This PR can be test with the following step:

Generate the rank table for all machine.
executetoy_proxy.py to launch the disaggregate prefill proxy server, specify the prefill ip, port and the decode ip, port
Run the prefill server and decode server.
send the request to the disaggregate prefill proxy

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.9.2
vLLM main: vllm-project/vllm@8d0a01a

github-actions · 2025-06-03T09:40:03Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-06-07T08:49:09Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-06-09T14:26:58Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-06-11T01:20:50Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-06-14T14:33:27Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

examples/disaggregate_prefill_v1/gen_ranktable.sh

linfeng-yuan

I think we need to add compatibility here for $1 and $2 in different nodes.

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

Signed-off-by: underfituu <hzhucong@163.com>

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

wangxiyuan

Let's merge this first to unblock others

xt-y · 2025-08-15T02:18:54Z

vllm_ascend/worker/model_runner_v1.py

-                                    ACL_FORMAT_FRACTAL_ND),
-                            )
+                    dtype = kv_cache_spec.dtype
+                    if self.model_config.is_deepseek_mla:


Why was the judgment changed from "self.torchair_graph_enabled" to "self.model_config.is_deepseek_mla"?

Its actually the same, becuase over that time only deepseek support torchair, then pangu get in with torchair combined with mha, but the code is written specific for mla. So we change this condition which is more accurate.

xt-y · 2025-08-15T02:21:21Z

vllm_ascend/worker/model_runner_v1.py

+                                    kv_cache,
+                                    alignment)[:cache_size].view(cache_shape)
+                            kv_cache_list.append(kv_cache)
+                        kv_caches[layer_name] = tuple(kv_cache_list)


Why Why was the type changed from tensor to tuple, and how does this affect D2H and H2D?

Its have nothing to do with d2h or h2d, its actually changed for the alignment limitation on ascend hardware, refer to https://github.com/vllm-project/vllm-ascend/pull/950/files/9c20fdda8111de05756ac4ed0a3c80cd776cfb34#diff-c49594855b615477bbc34f06d2d423a7dd84c021a7925cd1f61fdb79cb814c08R2064

Could you provide more details on the alignment limitations of Ascend hardware? We are currently implementing a KV Cache Connector, and these modifications will affect the offload/load operations of the cache in the connector.

We'd like to understand this modification in more detail to better adapt our implementation.

The memory needs to be 4M aligned, that's all

### What this PR does / why we need it? This PR adopt Mooncake TransferEngine for kv cache register and pull_blocks style disaggregate prefill implementation. ### Does this PR introduce any user-facing change? No ### Dependencies 1. Cann Dependencies Using Mooncake TransferEngine with Ascend Transport requires CANN version 8.2.RC1 or higher.（see detail Mooncake[#502](kvcache-ai/Mooncake#502)） 2. vllm-ascend This PR depends on changes introduced by #950 (modifications to `model_runner_v1`) and #1361 (updates to `schedule`), both of which have been merged into the `v0.9.1-dev` branch and are expected to land in `main` shortly. ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@1c859a1 --------- Signed-off-by: leichao.lc <leichao139636@163.com> Co-authored-by: jianzs <zheng.shoujian@outlook.com> Co-authored-by: zzy-ContiLearn <1831242919@qq.com> Co-authored-by: fems14 <1804143737@qq.com> Co-authored-by: Dreamerleader <2270923832@qq.com> Co-authored-by: chris668899 <15105191595@126.com> Co-authored-by: Pz1116 <zpbzpb123123@gmail.com>

This PR adopt `LLMDataDist` for kv cache register and `pull_blocks` style disaggregate prefill implementation. The interface implementation mainly follows the design of NIXL PR https://github.com/vllm-project/vllm/pull/17751/files#diff-7eaad0b7dee0626bf29d10081b0f0c5e3ea15a4af97e7b182a4e0d35f8346953 . This PR can be test with the following step: - Generate the rank table for all machine. - execute`toy_proxy.py` to launch the disaggregate prefill proxy server, specify the prefill ip, port and the decode ip, port - Run the prefill server and decode server. - send the request to the disaggregate prefill proxy - vLLM version: v0.9.2 - vLLM main: vllm-project/vllm@8d0a01a --------- Signed-off-by: ganyi <pleaplusone.gy@gmail.com> Signed-off-by: machenglong <machenglong_yewu@cmss.chinamobile.com> Signed-off-by: liziyu179 <3475441767@qq.com> Signed-off-by: underfitc <hucong24@huawei.com> Signed-off-by: zouyida2052 <zouyida@huawei.com> Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: underfituu <hzhucong@163.com> Co-authored-by: machenglong <machenglong_yewu@cmss.chinamobile.com> Co-authored-by: liziyu179 <3475441767@qq.com> Co-authored-by: underfitc <hucong24@huawei.com> Co-authored-by: zouyida2052 <zouyida@huawei.com> Co-authored-by: liziyu <liziyu16@huawei.com> Co-authored-by: underfituu <hzhucong@163.com>

### What this PR does / why we need it? This PR adopt Mooncake TransferEngine for kv cache register and pull_blocks style disaggregate prefill implementation. ### Does this PR introduce any user-facing change? No ### Dependencies 1. Cann Dependencies Using Mooncake TransferEngine with Ascend Transport requires CANN version 8.2.RC1 or higher.（see detail Mooncake[vllm-project#502](kvcache-ai/Mooncake#502)） 2. vllm-ascend This PR depends on changes introduced by vllm-project#950 (modifications to `model_runner_v1`) and vllm-project#1361 (updates to `schedule`), both of which have been merged into the `v0.9.1-dev` branch and are expected to land in `main` shortly. ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@1c859a1 --------- Signed-off-by: leichao.lc <leichao139636@163.com> Co-authored-by: jianzs <zheng.shoujian@outlook.com> Co-authored-by: zzy-ContiLearn <1831242919@qq.com> Co-authored-by: fems14 <1804143737@qq.com> Co-authored-by: Dreamerleader <2270923832@qq.com> Co-authored-by: chris668899 <15105191595@126.com> Co-authored-by: Pz1116 <zpbzpb123123@gmail.com>

### What this PR does / why we need it? This PR adopt `LLMDataDist` for kv cache register and `pull_blocks` style disaggregate prefill implementation. The interface implementation mainly follows the design of NIXL PR https://github.com/vllm-project/vllm/pull/17751/files#diff-7eaad0b7dee0626bf29d10081b0f0c5e3ea15a4af97e7b182a4e0d35f8346953 . This PR can be test with the following step: - Generate the rank table for all machine. - execute`toy_proxy.py` to launch the disaggregate prefill proxy server, specify the prefill ip, port and the decode ip, port - Run the prefill server and decode server. - send the request to the disaggregate prefill proxy ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.9.2 - vLLM main: vllm-project/vllm@8d0a01a --------- Signed-off-by: ganyi <pleaplusone.gy@gmail.com> Signed-off-by: machenglong <machenglong_yewu@cmss.chinamobile.com> Signed-off-by: liziyu179 <3475441767@qq.com> Signed-off-by: underfitc <hucong24@huawei.com> Signed-off-by: zouyida2052 <zouyida@huawei.com> Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: underfituu <hzhucong@163.com> Co-authored-by: machenglong <machenglong_yewu@cmss.chinamobile.com> Co-authored-by: liziyu179 <3475441767@qq.com> Co-authored-by: underfitc <hucong24@huawei.com> Co-authored-by: zouyida2052 <zouyida@huawei.com> Co-authored-by: liziyu <liziyu16@huawei.com> Co-authored-by: underfituu <hzhucong@163.com>

### What this PR does / why we need it? This PR adopt Mooncake TransferEngine for kv cache register and pull_blocks style disaggregate prefill implementation. ### Does this PR introduce any user-facing change? No ### Dependencies 1. Cann Dependencies Using Mooncake TransferEngine with Ascend Transport requires CANN version 8.2.RC1 or higher.（see detail Mooncake[vllm-project#502](kvcache-ai/Mooncake#502)） 2. vllm-ascend This PR depends on changes introduced by vllm-project#950 (modifications to `model_runner_v1`) and vllm-project#1361 (updates to `schedule`), both of which have been merged into the `v0.9.1-dev` branch and are expected to land in `main` shortly. ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@1c859a1 --------- Signed-off-by: leichao.lc <leichao139636@163.com> Co-authored-by: jianzs <zheng.shoujian@outlook.com> Co-authored-by: zzy-ContiLearn <1831242919@qq.com> Co-authored-by: fems14 <1804143737@qq.com> Co-authored-by: Dreamerleader <2270923832@qq.com> Co-authored-by: chris668899 <15105191595@126.com> Co-authored-by: Pz1116 <zpbzpb123123@gmail.com>

github-actions bot added module:ops module:core labels May 26, 2025

ganyi1996ppo force-pushed the ganyi/disaggregate_prefill branch from 6f37f50 to cf491c8 Compare May 27, 2025 09:08

github-actions bot added the merge-conflicts label Jun 3, 2025

wangxiyuan mentioned this pull request Jun 4, 2025

[release] 0.9.0rc1 release checklist #904

Closed

76 tasks

ganyi1996ppo force-pushed the ganyi/disaggregate_prefill branch from 96a3508 to cc4f9fb Compare June 7, 2025 03:53

github-actions bot removed the merge-conflicts label Jun 7, 2025

ganyi1996ppo force-pushed the ganyi/disaggregate_prefill branch from c72fdad to 6b8ce80 Compare June 7, 2025 07:01

github-actions bot added the merge-conflicts label Jun 7, 2025

ganyi1996ppo force-pushed the ganyi/disaggregate_prefill branch from 38ec528 to 69da5b3 Compare June 9, 2025 02:27

github-actions bot added merge-conflicts and removed merge-conflicts labels Jun 9, 2025

ganyi1996ppo force-pushed the ganyi/disaggregate_prefill branch from 41507c7 to 303cacd Compare June 10, 2025 12:07

github-actions bot added merge-conflicts and removed merge-conflicts labels Jun 10, 2025

ganyi1996ppo force-pushed the ganyi/disaggregate_prefill branch from 70e22d8 to 1bd2222 Compare June 11, 2025 02:50

github-actions bot removed the merge-conflicts label Jun 11, 2025

ganyi1996ppo marked this pull request as ready for review June 11, 2025 02:52

wangxiyuan mentioned this pull request Jun 11, 2025

[fix] fix bug in 1p1d disaggregated_prefill example #1153

Closed

github-actions bot added documentation Improvements or additions to documentation merge-conflicts labels Jun 11, 2025

jianzs added the pd-test enable pd test for PR label Jun 15, 2025

linfeng-yuan reviewed Jun 16, 2025

View reviewed changes

examples/disaggregate_prefill_v1/gen_ranktable.sh Outdated Show resolved Hide resolved

linfeng-yuan reviewed Jun 16, 2025

View reviewed changes

machenglong2025 force-pushed the ganyi/disaggregate_prefill branch from 779a3a5 to 95082f3 Compare June 16, 2025 06:01

ganyi1996ppo and others added 16 commits July 22, 2025 09:29

fix pangu's accuracy issue

4c5b0d9

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

lint fix

935a310

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

fix rebase error

9e27397

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

remove redundant code

7868900

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

force prefill node dummy run in prefill stage

d2628a5

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

async pullkv && delay-free blocks in prefill nodes

448809d

Signed-off-by: underfituu <hzhucong@163.com>

increase toy proxy httpx limits & support chat/completions

03c99cd

Signed-off-by: underfituu <hzhucong@163.com>

update Disaggregate prefill README

b9b7f6b

Signed-off-by: underfituu <hzhucong@163.com>

fix lint

d804377

Signed-off-by: underfituu <hzhucong@163.com>

fix isort

7e6bded

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

fix format

06a98a6

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

fix some of the ut

5cd0d1b

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

fix the format

2594ecb

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

fix the ut

f2603e3

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

fix the ut

b79c144

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

fix the ut

9c20fdd

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

ganyi1996ppo force-pushed the ganyi/disaggregate_prefill branch from 8708ee6 to 9c20fdd Compare July 22, 2025 01:29

Yikun mentioned this pull request Jul 23, 2025

[Doc] Update support feature #1828

Merged

wangxiyuan approved these changes Jul 26, 2025

View reviewed changes

ganyi1996ppo merged commit df0ec55 into vllm-project:main Jul 26, 2025
25 checks passed

ganyi1996ppo mentioned this pull request Jul 26, 2025

[Misc]: Re-Enable the disabled unittest in PR 950 #2042

Open

4 tasks

wangxiyuan mentioned this pull request Jul 28, 2025

[Feature]: Add support for the vLLM V1 connector #2057

Open

xt-y reviewed Aug 15, 2025

View reviewed changes

liziyu179 deleted the ganyi/disaggregate_prefill branch September 28, 2025 02:36

Disaggregate prefill for kv cache register style #950

Disaggregate prefill for kv cache register style #950

Uh oh!

Conversation

ganyi1996ppo commented May 26, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Jun 3, 2025

Uh oh!

github-actions bot commented Jun 7, 2025

Uh oh!

github-actions bot commented Jun 9, 2025

Uh oh!

github-actions bot commented Jun 11, 2025

Uh oh!

github-actions bot commented Jun 14, 2025

Uh oh!

Uh oh!

linfeng-yuan left a comment

Choose a reason for hiding this comment

Uh oh!

wangxiyuan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xt-y Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

ganyi1996ppo Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

xt-y Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

ganyi1996ppo Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

xt-y Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

xt-y Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

ganyi1996ppo Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

ganyi1996ppo commented May 26, 2025 •

edited by github-actions bot

Loading