Skip to content

Commit 186aeb0

Browse files
authored
Update latency test script due to deprecation in vllm (#2973)
Summary: For evaluating latency, currently we use python benchmarks/benchmark_latency.py but it is deprecated recently: ``` DEPRECATED: This script has been moved to the vLLM CLI. Please use the following command instead: vllm bench latency For help with the new command, run: vllm bench latency --help Alternatively, you can run the new command directly with: python -m vllm.entrypoints.cli.main bench latency --help ``` So we updated it to use `vllm bench latency` instead Test Plan: sh eval.sh --eval_type latency --model_ids Qwen/Qwen3-8B Reviewers: Subscribers: Tasks: Tags:
1 parent 83e8e60 commit 186aeb0

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

.github/scripts/torchao_model_releases/eval_latency.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ for MODEL_ID in "${MODEL_ID_ARRAY[@]}"; do
7575
for BATCH_SIZE in "${BATCH_SIZE_ARRAY[@]}"; do
7676
OUTPUT_FILE="$ORIG_DIR/${SAFE_MODEL_ID}_latency_batch${BATCH_SIZE}_in${INPUT_LEN}_out${OUTPUT_LEN}.log"
7777
echo "Running latency eval for model $MODEL_ID with batch size $BATCH_SIZE with input length: $INPUT_LEN and output length: $OUTPUT_LEN"
78-
VLLM_DISABLE_COMPILE_CACHE=1 python benchmarks/benchmark_latency.py --input-len $INPUT_LEN --output-len $OUTPUT_LEN --model $MODEL_ID --batch-size $BATCH_SIZE > "$OUTPUT_FILE" 2>&1
78+
VLLM_DISABLE_COMPILE_CACHE=1 vllm bench latency --input-len $INPUT_LEN --output-len $OUTPUT_LEN --model $MODEL_ID --batch-size $BATCH_SIZE > "$OUTPUT_FILE" 2>&1
7979
echo "Latency eval result saved to $OUTPUT_FILE"
8080
done
8181
echo "======================== Eval Latency $MODEL_ID End ========================="

0 commit comments

Comments
 (0)