Skip to content

Commit c3af447

Browse files
authored
[Doc]Add documentation to benchmarking script when running TGI (vllm-project#4920)
1 parent 1937e29 commit c3af447

File tree

2 files changed

+5
-1
lines changed

2 files changed

+5
-1
lines changed

benchmarks/benchmark_serving.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,10 @@
1717
--dataset-path <path to dataset> \
1818
--request-rate <request_rate> \ # By default <request_rate> is inf
1919
--num-prompts <num_prompts> # By default <num_prompts> is 1000
20+
21+
when using tgi backend, add
22+
--endpoint /generate_stream
23+
to the end of the command above.
2024
"""
2125
import argparse
2226
import asyncio

benchmarks/launch_tgi_server.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ PORT=8000
44
MODEL=$1
55
TOKENS=$2
66

7-
docker run --gpus all --shm-size 1g -p $PORT:80 \
7+
docker run -e HF_TOKEN=$HF_TOKEN --gpus all --shm-size 1g -p $PORT:80 \
88
-v $PWD/data:/data \
99
ghcr.io/huggingface/text-generation-inference:1.4.0 \
1010
--model-id $MODEL \

0 commit comments

Comments
 (0)