benchmark container on EC2 #2668

lxning · 2025-01-18T03:33:58Z

Description

This PR provides the tool for benchmarking container on EC2.

Brief description of what this PR is about

If this change is a backward incompatible change, why must this change be made?
Interesting edge cases to note here

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

Checklist:

Please add the link of Integration Tests Executor run with related tests.
Have you manually built the docker image and verify the change?
Have you run related tests? Check how to set up the test environment here; One example would be pytest tests.py -k "TestCorrectnessLmiDist" -m "lmi_dist"
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Test vllm container
[run_benchmark_vllm.log](h
meta-llama_Llama-3.1-8B-Instruct_tp2_500_250_vllm_2.log
meta-llama_Llama-3.1-8B-Instruct_tp2_500_250_vllm_1.log
ttps://github.com/user-attachments/files/18462683/run_benchmark_vllm.log)
Test TGI container
run_benchmark_tgi.log
meta-llama_Llama-3.1-8B-Instruct_tp2_500_250_tgi_2.log
meta-llama_Llama-3.1-8B-Instruct_tp2_500_250_tgi_1.log

tests/integration/benchmark/llmperf/README.md

siddvenk · 2025-01-23T20:23:15Z

tests/integration/benchmark/llmperf/config/config_vllm.yml

+            docker_parameters:
+              ["--runtime", "nvidia", "--gpus", "all", "--shm-size", "20g", "-v",  "/home/ubuntu/.cache/huggingface:/tmp/.cache/huggingface", "-p", "8080:8000"]      
+            server_parameters:
+              ["--model", "meta-llama/Llama-3.1-8B-Instruct", "--disable-log-stats", "--disable-log-requests", "--gpu-memory-utilization=0.9", "-tp=2", "--num-scheduler-steps=10", "--max-num-seqs=512", "--swap-space=16", "--max-model-len=8192"]     


minor: newline

siddvenk · 2025-01-23T20:26:19Z

tests/integration/benchmark/llmperf/scripts/run_benchmark.py

+    bash_command = ['bash', '-c', script, 'my_script', model, str(timeout)]
+    if run_bash_command(bash_command) is None:
+        logger.error(traceback.format_exc())
+        raise Exception("Failed at starting server within 2 hours")


should we use timeout/60 in this message instead of 2 hours?

siddvenk · 2025-01-23T20:27:28Z

tests/integration/benchmark/llmperf/scripts/run_benchmark.py

+    #!/bin/bash
+    MODEL_ID=$1
+    TIMEOUT=$2
+    start_time=$(date +%s)
+    end_time=$((start_time + $TIMEOUT)) 
+    while [ $(date +%s) -lt $end_time ]; do
+        filter="$(curl -s http://0.0.0.0:8080/v1/models |jq -e --arg expected "$MODEL_ID" '. != null and . != {} and has("data") and .data != null and (.data | length > 0) and (.data[].id | contains($expected))')"
+        if [ -z "$filter" ]; then
+            echo "Model $MODEL_ID is not available"
+            sleep 1m
+        else
+            echo "Model $MODEL_ID is available"
+            exit 0
+        fi
+    done
+
+    echo "Model $MODEL_ID is not available within 2 hours"
+    exit 1
+    '''


not a blocker, but I think this would easier to maintain in a standalone bash script rather than inline in the python script.

actually, i see a few more inline bash scripts. I think some of them are pretty similar to other scripts we have, but would need to look into it.

I'm fine with this for now - we can look into organizing some of these scripts post this pr

siddvenk · 2025-01-23T20:32:21Z

tests/integration/benchmark/llmperf/scripts/run_benchmark.py

+
+    try:
+        container_name = container["container"]
+        docker_command = f"docker run --name {container_name} --rm -e HUGGING_FACE_HUB_TOKEN={hf_token} {docker_parameters} --init {image} {server_parameters}"


Is HUGGING_FACE_HUB_TOKEN valid for all containers? HF docs show HF_TOKEN https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables#hftoken.

Maybe the containers differ in how they expect this value to be passed, should we include HF_TOKEN as well?

benchmark container on EC2

102899f

lxning requested review from zachgk and a team as code owners January 18, 2025 03:33

lxning self-assigned this Jan 18, 2025

lint format

cc7d5d6

lxning requested a review from siddvenk January 21, 2025 18:52

siddvenk reviewed Jan 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark container on EC2 #2668

benchmark container on EC2 #2668

lxning commented Jan 18, 2025

siddvenk Jan 23, 2025

siddvenk Jan 23, 2025

siddvenk Jan 23, 2025

siddvenk Jan 23, 2025

siddvenk Jan 23, 2025

benchmark container on EC2 #2668

Are you sure you want to change the base?

benchmark container on EC2 #2668

Conversation

lxning commented Jan 18, 2025

Description

Type of change

Checklist:

Feature/Issue validation/testing

siddvenk Jan 23, 2025

Choose a reason for hiding this comment

siddvenk Jan 23, 2025

Choose a reason for hiding this comment

siddvenk Jan 23, 2025

Choose a reason for hiding this comment

siddvenk Jan 23, 2025

Choose a reason for hiding this comment

siddvenk Jan 23, 2025

Choose a reason for hiding this comment