@@ -93,8 +93,8 @@ vllm bench serve \
9393 --port 8000 \
9494 --model OpenGVLab/InternVL3-8B-hf \
9595 --dataset-name random \
96- --random-input 2048 \
97- --random-output 1024 \
96+ --random-input-len 2048 \
97+ --random-output-len 1024 \
9898 --max-concurrency 10 \
9999 --num-prompts 50 \
100100 --ignore-eos
@@ -103,24 +103,26 @@ If it works successfully, you will see the following output.
103103
104104```
105105============ Serving Benchmark Result ============
106- Successful requests: 497
107- Benchmark duration (s): 229.42
108- Total input tokens: 507680
109- Total generated tokens: 62259
110- Request throughput (req/s): 2.17
111- Output token throughput (tok/s): 271.37
112- Total Token throughput (tok/s): 2484.22
106+ Successful requests: 50
107+ Maximum request concurrency: 10
108+ Benchmark duration (s): 247.46
109+ Total input tokens: 101987
110+ Total generated tokens: 51200
111+ Request throughput (req/s): 0.20
112+ Output token throughput (tok/s): 206.90
113+ Total Token throughput (tok/s): 619.04
113114---------------Time to First Token----------------
114- Mean TTFT (ms): 102429.40
115- Median TTFT (ms): 99644.38
116- P99 TTFT (ms): 213820.81
115+ Mean TTFT (ms): 932.11
116+ Median TTFT (ms): 854.60
117+ P99 TTFT (ms): 1845.91
117118-----Time per Output Token (excl. 1st token)------
118- Mean TPOT (ms): 664.26
119- Median TPOT (ms): 776.39
120- P99 TPOT (ms): 848.52
119+ Mean TPOT (ms): 47.44
120+ Median TPOT (ms): 47.53
121+ P99 TPOT (ms): 48.26
121122---------------Inter-token Latency----------------
122- Mean ITL (ms): 661.73
123- Median ITL (ms): 844.15
124- P99 ITL (ms): 856.42
123+ Mean ITL (ms): 47.44
124+ Median ITL (ms): 46.14
125+ P99 ITL (ms): 54.76
125126==================================================
127+
126128```
0 commit comments