@@ -39,7 +39,7 @@ Express workloads concisely using query length and sequence length:
3939
4040### Grammar Rule
4141
42- ```
42+ ``` text
4343Format: (<count>?) q<q_len>(k?) (s<seq_len>(k?))?
4444
4545- count: Number of identical requests (optional, default=1)
@@ -99,15 +99,18 @@ Compares FlashInfer-MLA against CUTLASS MLA with optimized `num_kv_splits` value
9999** Question:** At what query length does the prefill pipeline become faster than the decode pipeline?
100100
101101** Methodology:** Reproduces the original ` benchmark_mla_threshold.py ` study using the new interface:
102+
102103- For each query length (1-2048), test BOTH decode and prefill pipelines
103104- Find the crossover point where prefill becomes faster
104105- Analyze how this varies across batch sizes (1-256)
105106
107+
106108``` bash
107109python benchmark.py --config configs/reorder_threshold.yaml
108110```
109111
110112Tests query lengths from 1-2048 (fine-grained steps at low values, coarser at high values) across 9 batch sizes. For each query length, compares:
113+
111114- ** Decode pipeline** : ` threshold >= query_length `
112115- ** Prefill pipeline** : ` threshold < query_length `
113116
@@ -169,7 +172,7 @@ python benchmark.py \
169172
170173### All Command-Line Options
171174
172- ```
175+ ``` text
173176--backends BACKEND [BACKEND ...] # flash, triton, flashinfer, cutlass_mla,
174177 # flashinfer_mla, flashattn_mla, flashmla
175178--backend BACKEND # Single backend (alternative to --backends)
@@ -272,7 +275,7 @@ formatter.save_json(results, "output.json")
272275
273276## File Structure
274277
275- ```
278+ ``` text
276279attention_benchmarks/
277280├── README.md # This file
278281│
@@ -308,15 +311,19 @@ attention_benchmarks/
308311## Troubleshooting
309312
310313** Import errors?**
314+
311315``` bash
312316source /path/to/vllm/.venv/bin/activate
313317```
314318
315319** Backend not supported?**
320+
316321- Check hardware requirements above
317322- Some backends need Hopper/Blackwell
318323
324+
319325** OOM?**
326+
320327- Reduce batch size: ` "32q1s1k" ` → ` "16q1s1k" `
321328- Reduce sequence length: ` "64q1s16k" ` → ` "64q1s4k" `
322329
0 commit comments