[Bug]: inter-token latency is lower than TPOT in serving benchmark result

### Your current environment

v0.5.2. vLLM env is not an issue so I will just skip the collection process

### 🐛 Describe the bug

I am running benchmark tests and notice one potential problem. 

Seems the inter-token latency is lower than TPOT. Basically, inter-token latency takes TTFT into the consideration and should be higher than TPOT. However the data shows different result. I have not looked at the code yet and I will try to figure this out

```text
root@fb5250e2ae4c:/workspace# python3 vllm/benchmarks/benchmark_serving.py     --backend vllm     --dataset-name sharegpt     --dataset-path ./ShareGPT_V3_unfiltered_cleaned_split.json     --model meta-llama/Llama-2-7b-chat-hf     --num-prompts 200     --endpoint /v1/completions     --tokenizer meta-llama/Llama-2-7b-chat-hf     --save-result     2>&1 | tee benchmark_serving.txt
Namespace(backend='vllm', base_url=None, host='localhost', port=8000, endpoint='/v1/completions', dataset=None, dataset_name='sharegpt', dataset_path='./ShareGPT_V3_unfiltered_cleaned_split.json', model='meta-llama/Llama-2-7b-chat-hf', tokenizer='meta-llama/Llama-2-7b-chat-hf', best_of=1, use_beam_search=False, num_prompts=200, sharegpt_output_len=None, sonnet_input_len=550, sonnet_output_len=150, sonnet_prefix_len=200, random_input_len=1024, random_output_len=128, random_range_ratio=1.0, request_rate=inf, seed=0, trust_remote_code=False, disable_tqdm=False, save_result=True, metadata=None, result_dir=None, result_filename=None)
Starting initial single prompt test run...
Initial test run completed. Starting main benchmark run...
Traffic request rate: inf
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [01:12<00:00,  2.74it/s]s
============ Serving Benchmark Result ============
Successful requests:                     200       
Benchmark duration (s):                  72.96     
Total input tokens:                      49490     
Total generated tokens:                  41078     
Request throughput (req/s):              2.74      
Input token throughput (tok/s):          678.34    
Output token throughput (tok/s):         563.04    
---------------Time to First Token----------------
Mean TTFT (ms):                          3594.18   
Median TTFT (ms):                        3685.95   
P99 TTFT (ms):                           7361.98   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          186.90    
Median TPOT (ms):                        121.63    
P99 TPOT (ms):                           966.47    
---------------Inter-token Latency----------------
Mean ITL (ms):                           121.20    
Median ITL (ms):                         92.91     
P99 ITL (ms):                            310.89    
==================================================
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: inter-token latency is lower than TPOT in serving benchmark result #6531

Your current environment

🐛 Describe the bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: inter-token latency is lower than TPOT in serving benchmark result #6531

Description

Your current environment

🐛 Describe the bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions