Skip to content

Conversation

@ganyi1996ppo
Copy link
Collaborator

@ganyi1996ppo ganyi1996ppo commented May 26, 2025

What this PR does / why we need it?

This PR adopt LLMDataDist for kv cache register and pull_blocks style disaggregate prefill implementation. The interface implementation mainly follows the design of NIXL PR https://github.com/vllm-project/vllm/pull/17751/files#diff-7eaad0b7dee0626bf29d10081b0f0c5e3ea15a4af97e7b182a4e0d35f8346953 .

This PR can be test with the following step:

  • Generate the rank table for all machine.
  • executetoy_proxy.py to launch the disaggregate prefill proxy server, specify the prefill ip, port and the decode ip, port
  • Run the prefill server and decode server.
  • send the request to the disaggregate prefill proxy

Does this PR introduce any user-facing change?

How was this patch tested?

@github-actions
Copy link

github-actions bot commented Jun 3, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@wangxiyuan wangxiyuan mentioned this pull request Jun 4, 2025
76 tasks
@ganyi1996ppo ganyi1996ppo force-pushed the ganyi/disaggregate_prefill branch from 96a3508 to cc4f9fb Compare June 7, 2025 03:53
@ganyi1996ppo ganyi1996ppo force-pushed the ganyi/disaggregate_prefill branch from c72fdad to 6b8ce80 Compare June 7, 2025 07:01
@github-actions
Copy link

github-actions bot commented Jun 7, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link

github-actions bot commented Jun 9, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@ganyi1996ppo ganyi1996ppo force-pushed the ganyi/disaggregate_prefill branch from 70e22d8 to 1bd2222 Compare June 11, 2025 02:50
@ganyi1996ppo ganyi1996ppo marked this pull request as ready for review June 11, 2025 02:52
@github-actions github-actions bot added documentation Improvements or additions to documentation merge-conflicts labels Jun 11, 2025
@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@jianzs jianzs added the pd-test enable pd test for PR label Jun 15, 2025
Copy link
Collaborator

@linfeng-yuan linfeng-yuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add compatibility here for $1 and $2 in different nodes.

@machenglong2025 machenglong2025 force-pushed the ganyi/disaggregate_prefill branch from 779a3a5 to 95082f3 Compare June 16, 2025 06:01
ganyi1996ppo and others added 16 commits July 22, 2025 09:29
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: underfituu <hzhucong@163.com>
Signed-off-by: underfituu <hzhucong@163.com>
Signed-off-by: underfituu <hzhucong@163.com>
Signed-off-by: underfituu <hzhucong@163.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
@ganyi1996ppo ganyi1996ppo force-pushed the ganyi/disaggregate_prefill branch from 8708ee6 to 9c20fdd Compare July 22, 2025 01:29
Copy link
Collaborator

@wangxiyuan wangxiyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this first to unblock others

@ganyi1996ppo ganyi1996ppo merged commit df0ec55 into vllm-project:main Jul 26, 2025
25 checks passed
ACL_FORMAT_FRACTAL_ND),
)
dtype = kv_cache_spec.dtype
if self.model_config.is_deepseek_mla:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the judgment changed from "self.torchair_graph_enabled" to "self.model_config.is_deepseek_mla"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its actually the same, becuase over that time only deepseek support torchair, then pangu get in with torchair combined with mha, but the code is written specific for mla. So we change this condition which is more accurate.

kv_cache,
alignment)[:cache_size].view(cache_shape)
kv_cache_list.append(kv_cache)
kv_caches[layer_name] = tuple(kv_cache_list)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Why was the type changed from tensor to tuple, and how does this affect D2H and H2D?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide more details on the alignment limitations of Ascend hardware? We are currently implementing a KV Cache Connector, and these modifications will affect the offload/load operations of the cache in the connector.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd like to understand this modification in more detail to better adapt our implementation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memory needs to be 4M aligned, that's all

wangxiyuan pushed a commit that referenced this pull request Aug 18, 2025
### What this PR does / why we need it?
This PR adopt Mooncake TransferEngine for kv cache register and
pull_blocks style disaggregate prefill implementation.

### Does this PR introduce any user-facing change?
No

### Dependencies
1. Cann Dependencies
Using Mooncake TransferEngine with Ascend Transport requires CANN
version 8.2.RC1 or higher.(see detail
Mooncake[#502](kvcache-ai/Mooncake#502))

2. vllm-ascend
This PR depends on changes introduced by #950 (modifications to
`model_runner_v1`) and #1361 (updates to `schedule`), both of which have
been merged into the `v0.9.1-dev` branch and are expected to land in
`main` shortly.

### How was this patch tested?


- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@1c859a1

---------

Signed-off-by: leichao.lc <leichao139636@163.com>
Co-authored-by: jianzs <zheng.shoujian@outlook.com>
Co-authored-by: zzy-ContiLearn <1831242919@qq.com>
Co-authored-by: fems14 <1804143737@qq.com>
Co-authored-by: Dreamerleader <2270923832@qq.com>
Co-authored-by: chris668899 <15105191595@126.com>
Co-authored-by: Pz1116 <zpbzpb123123@gmail.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
This PR adopt `LLMDataDist` for kv cache register and `pull_blocks`
style disaggregate prefill implementation. The interface implementation
mainly follows the design of NIXL PR
https://github.com/vllm-project/vllm/pull/17751/files#diff-7eaad0b7dee0626bf29d10081b0f0c5e3ea15a4af97e7b182a4e0d35f8346953
.

This PR can be test with the following step:
- Generate the rank table for all machine.
- execute`toy_proxy.py` to launch the disaggregate prefill proxy server,
specify the prefill ip, port and the decode ip, port
- Run the prefill server and decode server.
- send the request to the disaggregate prefill proxy

- vLLM version: v0.9.2
- vLLM main:
vllm-project/vllm@8d0a01a

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: machenglong <machenglong_yewu@cmss.chinamobile.com>
Signed-off-by: liziyu179 <3475441767@qq.com>
Signed-off-by: underfitc <hucong24@huawei.com>
Signed-off-by: zouyida2052 <zouyida@huawei.com>
Signed-off-by: liziyu <liziyu16@huawei.com>
Signed-off-by: underfituu <hzhucong@163.com>
Co-authored-by: machenglong <machenglong_yewu@cmss.chinamobile.com>
Co-authored-by: liziyu179 <3475441767@qq.com>
Co-authored-by: underfitc <hucong24@huawei.com>
Co-authored-by: zouyida2052 <zouyida@huawei.com>
Co-authored-by: liziyu <liziyu16@huawei.com>
Co-authored-by: underfituu <hzhucong@163.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
### What this PR does / why we need it?
This PR adopt Mooncake TransferEngine for kv cache register and
pull_blocks style disaggregate prefill implementation.

### Does this PR introduce any user-facing change?
No

### Dependencies
1. Cann Dependencies
Using Mooncake TransferEngine with Ascend Transport requires CANN
version 8.2.RC1 or higher.(see detail
Mooncake[vllm-project#502](kvcache-ai/Mooncake#502))

2. vllm-ascend
This PR depends on changes introduced by vllm-project#950 (modifications to
`model_runner_v1`) and vllm-project#1361 (updates to `schedule`), both of which have
been merged into the `v0.9.1-dev` branch and are expected to land in
`main` shortly.

### How was this patch tested?


- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@1c859a1

---------

Signed-off-by: leichao.lc <leichao139636@163.com>
Co-authored-by: jianzs <zheng.shoujian@outlook.com>
Co-authored-by: zzy-ContiLearn <1831242919@qq.com>
Co-authored-by: fems14 <1804143737@qq.com>
Co-authored-by: Dreamerleader <2270923832@qq.com>
Co-authored-by: chris668899 <15105191595@126.com>
Co-authored-by: Pz1116 <zpbzpb123123@gmail.com>
@liziyu179 liziyu179 deleted the ganyi/disaggregate_prefill branch September 28, 2025 02:36
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
### What this PR does / why we need it?
This PR adopt `LLMDataDist` for kv cache register and `pull_blocks`
style disaggregate prefill implementation. The interface implementation
mainly follows the design of NIXL PR
https://github.com/vllm-project/vllm/pull/17751/files#diff-7eaad0b7dee0626bf29d10081b0f0c5e3ea15a4af97e7b182a4e0d35f8346953
.

This PR can be test with the following step:
- Generate the rank table for all machine.
- execute`toy_proxy.py` to launch the disaggregate prefill proxy server,
specify the prefill ip, port and the decode ip, port
- Run the prefill server and decode server.
- send the request to the disaggregate prefill proxy

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.9.2
- vLLM main:
vllm-project/vllm@8d0a01a

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: machenglong <machenglong_yewu@cmss.chinamobile.com>
Signed-off-by: liziyu179 <3475441767@qq.com>
Signed-off-by: underfitc <hucong24@huawei.com>
Signed-off-by: zouyida2052 <zouyida@huawei.com>
Signed-off-by: liziyu <liziyu16@huawei.com>
Signed-off-by: underfituu <hzhucong@163.com>
Co-authored-by: machenglong <machenglong_yewu@cmss.chinamobile.com>
Co-authored-by: liziyu179 <3475441767@qq.com>
Co-authored-by: underfitc <hucong24@huawei.com>
Co-authored-by: zouyida2052 <zouyida@huawei.com>
Co-authored-by: liziyu <liziyu16@huawei.com>
Co-authored-by: underfituu <hzhucong@163.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
### What this PR does / why we need it?
This PR adopt Mooncake TransferEngine for kv cache register and
pull_blocks style disaggregate prefill implementation.

### Does this PR introduce any user-facing change?
No

### Dependencies
1. Cann Dependencies
Using Mooncake TransferEngine with Ascend Transport requires CANN
version 8.2.RC1 or higher.(see detail
Mooncake[vllm-project#502](kvcache-ai/Mooncake#502))

2. vllm-ascend
This PR depends on changes introduced by vllm-project#950 (modifications to
`model_runner_v1`) and vllm-project#1361 (updates to `schedule`), both of which have
been merged into the `v0.9.1-dev` branch and are expected to land in
`main` shortly.

### How was this patch tested?


- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@1c859a1

---------

Signed-off-by: leichao.lc <leichao139636@163.com>
Co-authored-by: jianzs <zheng.shoujian@outlook.com>
Co-authored-by: zzy-ContiLearn <1831242919@qq.com>
Co-authored-by: fems14 <1804143737@qq.com>
Co-authored-by: Dreamerleader <2270923832@qq.com>
Co-authored-by: chris668899 <15105191595@126.com>
Co-authored-by: Pz1116 <zpbzpb123123@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation module:core module:ops module:quantization module:tests pd-test enable pd test for PR ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.