Skip to content

Conversation

@ceci3
Copy link
Contributor

@ceci3 ceci3 commented Apr 18, 2025

No description provided.

@ceci3 ceci3 requested a review from zhaoyinglia as a code owner April 18, 2025 08:36
@ceci3 ceci3 closed this Apr 29, 2025
aoyulong pushed a commit that referenced this pull request Apr 29, 2025
### Description: Multi-node Prefill/Decode Disaggregated Deployment with
FlagCX

This PR implements support for multi-node disaggregated deployment of
**prefill** and **decode** stages using `xPyD` Disaggregation:
- Schedule strategies of PD instances currently support: `robin`,
`random`. default is `robin`.
- It introduces a new communication backend based on
[FlagCX](https://github.com/FlagOpen/FlagCX). Merge [FlagCX
Adapter](#461).
- KV cache transfer is enabled via
[p2pConnector](vllm-project/vllm#15806) in
`vLLM`.


---

### How to Use

**Step 1**: Install
[FlagCX](https://github.com/FlagOpen/FlagCX?tab=readme-ov-file#quick-start)

**Step 2**: Install the `vLLM` version from
[FlagScale](https://github.com/FlagOpen/FlagScale?tab=readme-ov-file#setup)

**Step 3**: Define your config files under `./examples/qwen/conf`

**Step 4**: Launch the distributed deployment  
```bash
python run.py --config-path ./examples/qwen/conf --config-name config_qwen2.5_7b_disagg_xpyd action=run
```

**Step 5**: Send requests to the deployed service  
```bash
curl -X POST -s http://localhost:10001/v1/completions \
-H "Content-Type: application/json" \
-d '{
  "model": "/models/Qwen2.5-7B-Instruct",
  "prompt": "Introduce Bruce Lee in details",
  "max_tokens": 100,
  "temperature": 0,
  "stream": true
}'
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant