Skip to content

Commit fdc1a59

Browse files
fix pre-commit
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
1 parent bcc63d0 commit fdc1a59

File tree

1 file changed

+10
-3
lines changed

1 file changed

+10
-3
lines changed

benchmarks/attention_benchmarks/README.md

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ Express workloads concisely using query length and sequence length:
3939

4040
### Grammar Rule
4141

42-
```
42+
```text
4343
Format: (<count>?) q<q_len>(k?) (s<seq_len>(k?))?
4444
4545
- count: Number of identical requests (optional, default=1)
@@ -99,15 +99,18 @@ Compares FlashInfer-MLA against CUTLASS MLA with optimized `num_kv_splits` value
9999
**Question:** At what query length does the prefill pipeline become faster than the decode pipeline?
100100

101101
**Methodology:** Reproduces the original `benchmark_mla_threshold.py` study using the new interface:
102+
102103
- For each query length (1-2048), test BOTH decode and prefill pipelines
103104
- Find the crossover point where prefill becomes faster
104105
- Analyze how this varies across batch sizes (1-256)
105106

107+
106108
```bash
107109
python benchmark.py --config configs/reorder_threshold.yaml
108110
```
109111

110112
Tests query lengths from 1-2048 (fine-grained steps at low values, coarser at high values) across 9 batch sizes. For each query length, compares:
113+
111114
- **Decode pipeline**: `threshold >= query_length`
112115
- **Prefill pipeline**: `threshold < query_length`
113116

@@ -169,7 +172,7 @@ python benchmark.py \
169172

170173
### All Command-Line Options
171174

172-
```
175+
```text
173176
--backends BACKEND [BACKEND ...] # flash, triton, flashinfer, cutlass_mla,
174177
# flashinfer_mla, flashattn_mla, flashmla
175178
--backend BACKEND # Single backend (alternative to --backends)
@@ -272,7 +275,7 @@ formatter.save_json(results, "output.json")
272275

273276
## File Structure
274277

275-
```
278+
```text
276279
attention_benchmarks/
277280
├── README.md # This file
278281
@@ -308,15 +311,19 @@ attention_benchmarks/
308311
## Troubleshooting
309312

310313
**Import errors?**
314+
311315
```bash
312316
source /path/to/vllm/.venv/bin/activate
313317
```
314318

315319
**Backend not supported?**
320+
316321
- Check hardware requirements above
317322
- Some backends need Hopper/Blackwell
318323

324+
319325
**OOM?**
326+
320327
- Reduce batch size: `"32q1s1k"``"16q1s1k"`
321328
- Reduce sequence length: `"64q1s16k"``"64q1s4k"`
322329

0 commit comments

Comments
 (0)