-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
benchmark container on EC2 #2668
base: master
Are you sure you want to change the base?
Conversation
docker_parameters: | ||
["--runtime", "nvidia", "--gpus", "all", "--shm-size", "20g", "-v", "/home/ubuntu/.cache/huggingface:/tmp/.cache/huggingface", "-p", "8080:8000"] | ||
server_parameters: | ||
["--model", "meta-llama/Llama-3.1-8B-Instruct", "--disable-log-stats", "--disable-log-requests", "--gpu-memory-utilization=0.9", "-tp=2", "--num-scheduler-steps=10", "--max-num-seqs=512", "--swap-space=16", "--max-model-len=8192"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: newline
bash_command = ['bash', '-c', script, 'my_script', model, str(timeout)] | ||
if run_bash_command(bash_command) is None: | ||
logger.error(traceback.format_exc()) | ||
raise Exception("Failed at starting server within 2 hours") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use timeout/60 in this message instead of 2 hours?
#!/bin/bash | ||
MODEL_ID=$1 | ||
TIMEOUT=$2 | ||
start_time=$(date +%s) | ||
end_time=$((start_time + $TIMEOUT)) | ||
while [ $(date +%s) -lt $end_time ]; do | ||
filter="$(curl -s http://0.0.0.0:8080/v1/models |jq -e --arg expected "$MODEL_ID" '. != null and . != {} and has("data") and .data != null and (.data | length > 0) and (.data[].id | contains($expected))')" | ||
if [ -z "$filter" ]; then | ||
echo "Model $MODEL_ID is not available" | ||
sleep 1m | ||
else | ||
echo "Model $MODEL_ID is available" | ||
exit 0 | ||
fi | ||
done | ||
|
||
echo "Model $MODEL_ID is not available within 2 hours" | ||
exit 1 | ||
''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a blocker, but I think this would easier to maintain in a standalone bash script rather than inline in the python script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, i see a few more inline bash scripts. I think some of them are pretty similar to other scripts we have, but would need to look into it.
I'm fine with this for now - we can look into organizing some of these scripts post this pr
|
||
try: | ||
container_name = container["container"] | ||
docker_command = f"docker run --name {container_name} --rm -e HUGGING_FACE_HUB_TOKEN={hf_token} {docker_parameters} --init {image} {server_parameters}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is HUGGING_FACE_HUB_TOKEN valid for all containers? HF docs show HF_TOKEN https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables#hftoken.
Maybe the containers differ in how they expect this value to be passed, should we include HF_TOKEN as well?
Description
This PR provides the tool for benchmarking container on EC2.
Brief description of what this PR is about
Type of change
Please delete options that are not relevant.
Checklist:
pytest tests.py -k "TestCorrectnessLmiDist" -m "lmi_dist"
Feature/Issue validation/testing
Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Test vllm container
[run_benchmark_vllm.log](h
meta-llama_Llama-3.1-8B-Instruct_tp2_500_250_vllm_2.log
meta-llama_Llama-3.1-8B-Instruct_tp2_500_250_vllm_1.log
ttps://github.com/user-attachments/files/18462683/run_benchmark_vllm.log)
Test TGI container
run_benchmark_tgi.log
meta-llama_Llama-3.1-8B-Instruct_tp2_500_250_tgi_2.log
meta-llama_Llama-3.1-8B-Instruct_tp2_500_250_tgi_1.log