@@ -123,12 +123,33 @@ When enabled, vLLM collects load statistics with every forward pass and periodic
123123
124124### EPLB Parameters
125125
126+ Configure EPLB with the ` --eplb-config ` argument, which accepts a JSON string. The available keys and their descriptions are:
127+
126128| Parameter | Description | Default |
127129| -----------| -------------| ---------|
128- | ` --eplb-window-size ` | Number of engine steps to track for rebalancing decisions | - |
129- | ` --eplb-step-interval ` | Frequency of rebalancing (every N engine steps) | - |
130- | ` --eplb-log-balancedness ` | Log balancedness metrics (avg tokens per expert ÷ max tokens per expert) | ` false ` |
131- | ` --num-redundant-experts ` | Additional global experts per EP rank beyond equal distribution | ` 0 ` |
130+ | ` window_size ` | Number of engine steps to track for rebalancing decisions | 1000 |
131+ | ` step_interval ` | Frequency of rebalancing (every N engine steps) | 3000 |
132+ | ` log_balancedness ` | Log balancedness metrics (avg tokens per expert ÷ max tokens per expert) | ` false ` |
133+ | ` num_redundant_experts ` | Additional global experts per EP rank beyond equal distribution | ` 0 ` |
134+
135+ For example:
136+
137+ ``` bash
138+ vllm serve Qwen/Qwen3-30B-A3B \
139+ --enable-eplb \
140+ --eplb-config ' {"window_size":1000,"step_interval":3000,"num_redundant_experts":2,"log_balancedness":true}'
141+ ```
142+
143+ ??? tip "Prefer individual arguments instead of JSON?"
144+
145+ ```bash
146+ vllm serve Qwen/Qwen3-30B-A3B \
147+ --enable-eplb \
148+ --eplb-config.window_size 1000 \
149+ --eplb-config.step_interval 3000 \
150+ --eplb-config.num_redundant_experts 2 \
151+ --eplb-config.log_balancedness true
152+ ```
132153
133154### Expert Distribution Formula
134155
@@ -146,12 +167,10 @@ VLLM_ALL2ALL_BACKEND=pplx VLLM_USE_DEEP_GEMM=1 vllm serve deepseek-ai/DeepSeek-V
146167 --data-parallel-size 8 \ # Data parallelism
147168 --enable-expert-parallel \ # Enable EP
148169 --enable-eplb \ # Enable load balancer
149- --eplb-log-balancedness \ # Log balancing metrics
150- --eplb-window-size 1000 \ # Track last 1000 engine steps
151- --eplb-step-interval 3000 # Rebalance every 3000 steps
170+ --eplb-config ' {"window_size":1000,"step_interval":3000,"num_redundant_experts":2,"log_balancedness":true}'
152171```
153172
154- For multi-node deployment, add these EPLB flags to each node's command. We recommend setting ` --num-redundant-experts ` to 32 in large scale use cases so the most popular experts are always available.
173+ For multi-node deployment, add these EPLB flags to each node's command. We recommend setting ` --eplb-config '{"num_redundant_experts":32}' ` to 32 in large scale use cases so the most popular experts are always available.
155174
156175## Disaggregated Serving (Prefill/Decode Split)
157176
0 commit comments