Skip to content

Commit 809343c

Browse files
lengrongfuxuebwang-amd
authored andcommitted
[Docs]add eplb_config param use docs (vllm-project#24213)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
1 parent b2cc955 commit 809343c

File tree

1 file changed

+27
-8
lines changed

1 file changed

+27
-8
lines changed

docs/serving/expert_parallel_deployment.md

Lines changed: 27 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -123,12 +123,33 @@ When enabled, vLLM collects load statistics with every forward pass and periodic
123123

124124
### EPLB Parameters
125125

126+
Configure EPLB with the `--eplb-config` argument, which accepts a JSON string. The available keys and their descriptions are:
127+
126128
| Parameter | Description | Default |
127129
|-----------|-------------|---------|
128-
| `--eplb-window-size` | Number of engine steps to track for rebalancing decisions | - |
129-
| `--eplb-step-interval` | Frequency of rebalancing (every N engine steps) | - |
130-
| `--eplb-log-balancedness` | Log balancedness metrics (avg tokens per expert ÷ max tokens per expert) | `false` |
131-
| `--num-redundant-experts` | Additional global experts per EP rank beyond equal distribution | `0` |
130+
| `window_size`| Number of engine steps to track for rebalancing decisions | 1000 |
131+
| `step_interval`| Frequency of rebalancing (every N engine steps) | 3000 |
132+
| `log_balancedness` | Log balancedness metrics (avg tokens per expert ÷ max tokens per expert) | `false` |
133+
| `num_redundant_experts` | Additional global experts per EP rank beyond equal distribution | `0` |
134+
135+
For example:
136+
137+
```bash
138+
vllm serve Qwen/Qwen3-30B-A3B \
139+
--enable-eplb \
140+
--eplb-config '{"window_size":1000,"step_interval":3000,"num_redundant_experts":2,"log_balancedness":true}'
141+
```
142+
143+
??? tip "Prefer individual arguments instead of JSON?"
144+
145+
```bash
146+
vllm serve Qwen/Qwen3-30B-A3B \
147+
--enable-eplb \
148+
--eplb-config.window_size 1000 \
149+
--eplb-config.step_interval 3000 \
150+
--eplb-config.num_redundant_experts 2 \
151+
--eplb-config.log_balancedness true
152+
```
132153

133154
### Expert Distribution Formula
134155

@@ -146,12 +167,10 @@ VLLM_ALL2ALL_BACKEND=pplx VLLM_USE_DEEP_GEMM=1 vllm serve deepseek-ai/DeepSeek-V
146167
--data-parallel-size 8 \ # Data parallelism
147168
--enable-expert-parallel \ # Enable EP
148169
--enable-eplb \ # Enable load balancer
149-
--eplb-log-balancedness \ # Log balancing metrics
150-
--eplb-window-size 1000 \ # Track last 1000 engine steps
151-
--eplb-step-interval 3000 # Rebalance every 3000 steps
170+
--eplb-config '{"window_size":1000,"step_interval":3000,"num_redundant_experts":2,"log_balancedness":true}'
152171
```
153172

154-
For multi-node deployment, add these EPLB flags to each node's command. We recommend setting `--num-redundant-experts` to 32 in large scale use cases so the most popular experts are always available.
173+
For multi-node deployment, add these EPLB flags to each node's command. We recommend setting `--eplb-config '{"num_redundant_experts":32}'` to 32 in large scale use cases so the most popular experts are always available.
155174

156175
## Disaggregated Serving (Prefill/Decode Split)
157176

0 commit comments

Comments
 (0)