-
Notifications
You must be signed in to change notification settings - Fork 536
Closed
Labels
RFCRequest For CommentsRequest For Comments
Milestone
Description
Motivation.
P/D Disaggregation plays a very important role in deploying vllm inference services in large-scale clusters. There is already a initial P/D Disaggregation support in vllm-ascend now, and we' ll continue to develop it with more parrallel mechanisms including tp, ep and dp, and graph mode integration, etc.
The related CI for 1p1d, xpyd scenarios will be integrated step by step, with or w/o parrallel mechanisms including tp, ep, dp, etc.
Proposed Change.
P/D Disaggregation
- P/D Disaggregation in v0
- [Feature] Add PD separation feature #432
- [Disaggregated Prefill] P2P Disaggregated Prefill based on llm_datadist #694
- [P/D][DP] Upgrade pd proxy to support both prefill and decode instances in disaggregated-prefill. #794
- tutorials
- 1p1d + offline + single machine [Guide]: How to use disaggregated_prefill #857
- P/D Disaggregation in v1
CI Machine Preparation
UT Integration
Feature coverage matrix
| P/D Disaggregation | tp | ep | dp |
|---|---|---|---|
| 1p1d/xpyd | |||
| 1p1d/xpyd | √ | ||
| 1p1d/xpyd | √ | √ | |
| 1p1d/xpyd | √ | √ | √ |
| 1p1d/xpyd | √ | ||
| 1p1d/xpyd | √ | √ | |
| 1p1d/xpyd | √ |
- Basic P/D Disaggregation w/o parrallel mechanisms.
- Adding the above parrallel mechanisms.
- Adding graph mode
Metadata
Metadata
Assignees
Labels
RFCRequest For CommentsRequest For Comments