-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
triagedIssue has been triaged by maintainersIssue has been triaged by maintainers
Description
Hi,
my benchmark performance has significant decay with the repo performance docs.
Especialy the 1st token latency. Could anyone help check it?
GPU: 1 x A100 80GB
CPU: 96 x Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz
Binary: gptManagerBenchmark
Model build:
precision=float16
python build.py --model_dir /models/llama-2-7b-chat-hf/ \
--dtype ${precision} \
--use_gpt_attention_plugin ${precision} \
--use_gemm_plugin ${precision} \
--max_batch_size ${max_batch_size} \
--max_input_len ${max_input_len} \
--max_output_len ${max_output_len} \
--output_dir /models/llama2-7b-chat-hf/${precision}-${max_batch_size}-${max_input_len}-${max_output_len}/1-gpu/ \
--use_inflight_batching \
--paged_kv_cache \
--remove_input_padding \
--enable_context_fmhabenchmark:
${proj_dir}/cpp/bbb/benchmarks/gptManagerBenchmark \
--model=llama \
--engine_dir=/models/llama2-7b-chat-hf/${precision}-${max_batch_size}-${max_input_len}-${max_output_len}/1-gpu/ \
--dataset=${proj_dir}/benchmarks/cpp/preprocessed_dataset.json \
--log_level=infoThroughput:
Requests have different length of input_ids and request_output_len, the average is:
input_ids: 19
request_output_len: 299
| Model | max_batch_size, max_input_len, max_output_len | total requests | output tokens/s |
|---|---|---|---|
| llama2-7b | 16, 2048, 2048 | 100 | 755.615 |
| llama2-7b | 32, 2048, 2048 | 100 | 906.030 |
| llama2-7b | 64, 2048, 2048 | 100 | 977.744 |
| llama2-7b | 128, 2048, 2048 | 100 | 986.230 |
| llama2-7b | 16, 2048, 2048 | 1000 | 1193.203 |
| llama2-7b | 32, 2048, 2048 | 1000 | 1978.126 |
| llama2-7b | 64, 2048, 2048 | 1000 | 2860.245 |
| llama2-7b | 128, 2048, 2048 | 1000 | 3227.109 |
1st token latency:
| Model | max_batch_size, max_input_len, max_output_len | total requests | input tokens | first token latency (ms) |
|---|---|---|---|---|
| llama2-7b | 1, 128, 128 | 1 | 19 | 142 |
| llama2-7b | 1, 128, 128 | 1 | 128 | 163 |
| llama2-7b | 128, 2048, 2048 | 1 | 128 | 161 |
| llama2-7b | 128, 2048, 2048 | 1 | 2048 | 289 |
Metadata
Metadata
Assignees
Labels
triagedIssue has been triaged by maintainersIssue has been triaged by maintainers