@@ -4,7 +4,7 @@ This README guides you through running benchmark tests with the extensive
44datasets supported on vLLM. It’s a living document, updated as new features and datasets
55become available.
66
7- ## Dataset Overview
7+ ** Dataset Overview**
88
99<table style =" width :100% ; border-collapse : collapse ;" >
1010 <thead >
@@ -82,7 +82,10 @@ become available.
8282** Note** : HuggingFace dataset's ` dataset-name ` should be set to ` hf `
8383
8484---
85- ## Example - Online Benchmark
85+ <details >
86+ <summary ><b >🚀 Example - Online Benchmark</b ></summary >
87+
88+ <br />
8689
8790First start serving your model
8891
@@ -130,7 +133,8 @@ P99 ITL (ms): 8.39
130133==================================================
131134```
132135
133- ### Custom Dataset
136+ ** Custom Dataset**
137+
134138If the dataset you want to benchmark is not supported yet in vLLM, even then you can benchmark on it using ` CustomDataset ` . Your data needs to be in ` .jsonl ` format and needs to have "prompt" field per entry, e.g., data.jsonl
135139
136140```
@@ -162,7 +166,7 @@ python3 benchmarks/benchmark_serving.py --port 9001 --save-result --save-detaile
162166
163167You can skip applying chat template if your data already has it by using ` --custom-skip-chat-template ` .
164168
165- ### VisionArena Benchmark for Vision Language Models
169+ ** VisionArena Benchmark for Vision Language Models**
166170
167171``` bash
168172# need a model with vision capability here
@@ -180,7 +184,7 @@ python3 vllm/benchmarks/benchmark_serving.py \
180184 --num-prompts 1000
181185```
182186
183- ### InstructCoder Benchmark with Speculative Decoding
187+ ** InstructCoder Benchmark with Speculative Decoding**
184188
185189``` bash
186190VLLM_USE_V1=1 vllm serve meta-llama/Meta-Llama-3-8B-Instruct \
@@ -197,7 +201,7 @@ python3 benchmarks/benchmark_serving.py \
197201 --num-prompts 2048
198202```
199203
200- ### Other HuggingFaceDataset Examples
204+ ** Other HuggingFaceDataset Examples**
201205
202206``` bash
203207vllm serve Qwen/Qwen2-VL-7B-Instruct --disable-log-requests
@@ -251,7 +255,7 @@ python3 vllm/benchmarks/benchmark_serving.py \
251255 --num-prompts 80
252256```
253257
254- ### Running With Sampling Parameters
258+ ** Running With Sampling Parameters**
255259
256260When using OpenAI-compatible backends such as ` vllm ` , optional sampling
257261parameters can be specified. Example client command:
@@ -269,7 +273,7 @@ python3 vllm/benchmarks/benchmark_serving.py \
269273 --num-prompts 10
270274```
271275
272- ### Running With Ramp-Up Request Rate
276+ ** Running With Ramp-Up Request Rate**
273277
274278The benchmark tool also supports ramping up the request rate over the
275279duration of the benchmark run. This can be useful for stress testing the
@@ -284,8 +288,12 @@ The following arguments can be used to control the ramp-up:
284288- ` --ramp-up-start-rps ` : The request rate at the beginning of the benchmark.
285289- ` --ramp-up-end-rps ` : The request rate at the end of the benchmark.
286290
287- ---
288- ## Example - Offline Throughput Benchmark
291+ </details >
292+
293+ <details >
294+ <summary ><b >📈 Example - Offline Throughput Benchmark</b ></summary >
295+
296+ <br />
289297
290298``` bash
291299python3 vllm/benchmarks/benchmark_throughput.py \
@@ -303,7 +311,7 @@ Total num prompt tokens: 5014
303311Total num output tokens: 1500
304312```
305313
306- ### VisionArena Benchmark for Vision Language Models
314+ ** VisionArena Benchmark for Vision Language Models**
307315
308316``` bash
309317python3 vllm/benchmarks/benchmark_throughput.py \
@@ -323,7 +331,7 @@ Total num prompt tokens: 14527
323331Total num output tokens: 1280
324332```
325333
326- ### InstructCoder Benchmark with Speculative Decoding
334+ ** InstructCoder Benchmark with Speculative Decoding**
327335
328336``` bash
329337VLLM_WORKER_MULTIPROC_METHOD=spawn \
@@ -347,7 +355,7 @@ Total num prompt tokens: 261136
347355Total num output tokens: 204800
348356```
349357
350- ### Other HuggingFaceDataset Examples
358+ ** Other HuggingFaceDataset Examples**
351359
352360** ` lmms-lab/LLaVA-OneVision-Data ` **
353361
@@ -386,7 +394,7 @@ python3 benchmarks/benchmark_throughput.py \
386394 --num-prompts 10
387395```
388396
389- ### Benchmark with LoRA Adapters
397+ ** Benchmark with LoRA Adapters**
390398
391399``` bash
392400# download dataset
@@ -403,18 +411,22 @@ python3 vllm/benchmarks/benchmark_throughput.py \
403411 --lora-path yard1/llama-2-7b-sql-lora-test
404412 ```
405413
406- ---
407- ## Example - Structured Output Benchmark
414+ </details >
415+
416+ <details >
417+ <summary ><b >🛠️ Example - Structured Output Benchmark</b ></summary >
418+
419+ <br />
408420
409421Benchmark the performance of structured output generation (JSON, grammar, regex).
410422
411- ### Server Setup
423+ ** Server Setup**
412424
413425``` bash
414426vllm serve NousResearch/Hermes-3-Llama-3.1-8B --disable-log-requests
415427```
416428
417- ### JSON Schema Benchmark
429+ ** JSON Schema Benchmark**
418430
419431``` bash
420432python3 benchmarks/benchmark_serving_structured_output.py \
@@ -426,7 +438,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
426438 --num-prompts 1000
427439```
428440
429- ### Grammar-based Generation Benchmark
441+ ** Grammar-based Generation Benchmark**
430442
431443``` bash
432444python3 benchmarks/benchmark_serving_structured_output.py \
@@ -438,7 +450,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
438450 --num-prompts 1000
439451```
440452
441- ### Regex-based Generation Benchmark
453+ ** Regex-based Generation Benchmark**
442454
443455``` bash
444456python3 benchmarks/benchmark_serving_structured_output.py \
@@ -449,7 +461,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
449461 --num-prompts 1000
450462```
451463
452- ### Choice-based Generation Benchmark
464+ ** Choice-based Generation Benchmark**
453465
454466``` bash
455467python3 benchmarks/benchmark_serving_structured_output.py \
@@ -460,7 +472,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
460472 --num-prompts 1000
461473```
462474
463- ### XGrammar Benchmark Dataset
475+ ** XGrammar Benchmark Dataset**
464476
465477``` bash
466478python3 benchmarks/benchmark_serving_structured_output.py \
@@ -471,12 +483,16 @@ python3 benchmarks/benchmark_serving_structured_output.py \
471483 --num-prompts 1000
472484```
473485
474- ---
475- ## Example - Long Document QA Throughput Benchmark
486+ </details >
487+
488+ <details >
489+ <summary ><b >📚 Example - Long Document QA Benchmark</b ></summary >
490+
491+ <br />
476492
477493Benchmark the performance of long document question-answering with prefix caching.
478494
479- ### Basic Long Document QA Test
495+ ** Basic Long Document QA Test**
480496
481497``` bash
482498python3 benchmarks/benchmark_long_document_qa_throughput.py \
@@ -488,7 +504,7 @@ python3 benchmarks/benchmark_long_document_qa_throughput.py \
488504 --repeat-count 5
489505```
490506
491- ### Different Repeat Modes
507+ ** Different Repeat Modes**
492508
493509``` bash
494510# Random mode (default) - shuffle prompts randomly
@@ -519,12 +535,16 @@ python3 benchmarks/benchmark_long_document_qa_throughput.py \
519535 --repeat-mode interleave
520536```
521537
522- ---
523- ## Example - Prefix Caching Benchmark
538+ </details >
539+
540+ <details >
541+ <summary ><b >🗂️ Example - Prefix Caching Benchmark</b ></summary >
542+
543+ <br />
524544
525545Benchmark the efficiency of automatic prefix caching.
526546
527- ### Fixed Prompt with Prefix Caching
547+ ** Fixed Prompt with Prefix Caching**
528548
529549``` bash
530550python3 benchmarks/benchmark_prefix_caching.py \
@@ -535,7 +555,7 @@ python3 benchmarks/benchmark_prefix_caching.py \
535555 --input-length-range 128:256
536556```
537557
538- ### ShareGPT Dataset with Prefix Caching
558+ ** ShareGPT Dataset with Prefix Caching**
539559
540560``` bash
541561# download dataset
@@ -550,12 +570,16 @@ python3 benchmarks/benchmark_prefix_caching.py \
550570 --input-length-range 128:256
551571```
552572
553- ---
554- ## Example - Request Prioritization Benchmark
573+ </details >
574+
575+ <details >
576+ <summary ><b >⚡ Example - Request Prioritization Benchmark</b ></summary >
577+
578+ <br />
555579
556580Benchmark the performance of request prioritization in vLLM.
557581
558- ### Basic Prioritization Test
582+ ** Basic Prioritization Test**
559583
560584``` bash
561585python3 benchmarks/benchmark_prioritization.py \
@@ -566,7 +590,7 @@ python3 benchmarks/benchmark_prioritization.py \
566590 --scheduling-policy priority
567591```
568592
569- ### Multiple Sequences per Prompt
593+ ** Multiple Sequences per Prompt**
570594
571595``` bash
572596python3 benchmarks/benchmark_prioritization.py \
@@ -577,3 +601,5 @@ python3 benchmarks/benchmark_prioritization.py \
577601 --scheduling-policy priority \
578602 --n 2
579603```
604+
605+ </details >
0 commit comments