Skip to content

Commit 7dfafbe

Browse files
committed
fix readme
Signed-off-by: wangli <wangli858794774@gmail.com>
1 parent 865dc08 commit 7dfafbe

File tree

2 files changed

+3
-5
lines changed

2 files changed

+3
-5
lines changed

benchmarks/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,21 +7,21 @@ This document outlines the benchmarking methodology for vllm-ascend, aimed at ev
77
- Input length: 32 tokens.
88
- Output length: 128 tokens.
99
- Batch size: fixed (8).
10-
- Models: Meta-Llama-3.1-8B-Instruct, Qwen2.5-7B-Instruct.
10+
- Models: Qwen2.5-7B-Instruct, Qwen/Qwen2.5-VL-7B-Instruct.
1111
- Evaluation metrics: end-to-end latency (mean, median, p99).
1212

1313
- Throughput tests
1414
- Input length: randomly sample 200 prompts from ShareGPT dataset (with fixed random seed).
1515
- Output length: the corresponding output length of these 200 prompts.
1616
- Batch size: dynamically determined by vllm to achieve maximum throughput.
17-
- Models: Meta-Llama-3.1-8B-Instruct, Qwen2.5-7B-Instruct.
17+
- Models: Qwen2.5-7B-Instruct, Qwen/Qwen2.5-VL-7B-Instruct.
1818
- Evaluation metrics: throughput.
1919
- Serving tests
2020
- Input length: randomly sample 200 prompts from ShareGPT dataset (with fixed random seed).
2121
- Output length: the corresponding output length of these 200 prompts.
2222
- Batch size: dynamically determined by vllm and the arrival pattern of the requests.
2323
- **Average QPS (query per second)**: 1, 4, 16 and inf. QPS = inf means all requests come at once. For other QPS values, the arrival time of each query is determined using a random Poisson process (with fixed random seed).
24-
- Models: Meta-Llama-3.1-8B-Instruct, Qwen2.5-7B-Instruct.
24+
- Models: Qwen2.5-7B-Instruct, Qwen/Qwen2.5-VL-7B-Instruct.
2525
- Evaluation metrics: throughput, TTFT (time to the first token, with mean, median and p99), ITL (inter-token latency, with mean, median and p99).
2626

2727
**Benchmarking Duration**: about 800 senond for single model.

benchmarks/scripts/convert_json_to_markdown.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1-
# SPDX-License-Identifier: Apache-2.0
2-
31
import argparse
42
import json
53
import os

0 commit comments

Comments
 (0)