[CI/Build][Intel] Enable performance benchmarks for Intel Gaudi 3 #26919

jakub-sochacki · 2025-10-15T15:48:37Z

Purpose

Enable Intel Gaudi 3 Accelerator for vLLM Benchmark suite for performance benchmarking.

New Dockerfile.hpu with dynamic vllm-gaudi compatibility (fetches compatible vLLM commit from VLLM_STABLE_COMMIT)
Added latency, throughput, and serving test suites for Llama 3.1 (8B, 70B) and Mixtral 8x7B with HPU-specific optimizations
Enabled Intel Gaudi 3 executions in run-performance-benchmarks.sh with HPU device detection (hl-smi) and -hpu architecture suffix

Test Plan

Models tested: Llama 3.1-8B (TP1), Llama 3.1-70B (TP4), Mixtral 8x7B (TP2)
Scenarios: throughput, latency and serving

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

github-actions · 2025-10-15T15:48:46Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request enables performance benchmarks for Intel Gaudi 3 by adding a new Dockerfile, updating the benchmark script to detect Gaudi devices, and including new test configurations. The changes are well-structured and mostly look good. I've found one potential bug in the benchmark script where a command to check memory usage might fail due to incorrect parsing of the command output. This could prevent the script from correctly waiting for resources to be freed. My review includes a specific suggestion to fix this issue.

.buildkite/nightly-benchmarks/scripts/run-performance-benchmarks.sh

chatgpt-codex-connector · 2025-10-15T15:53:28Z

💡 Codex Review

vllm/docker/Dockerfile.hpu

Lines 44 to 56 in 41c2a72

    
           # Clone the vllm repository and install inside the container 
        
           # Dynamically fetch the compatible vLLM commit from vllm-gaudi repo 
        
           RUN VLLM_STABLE_COMMIT=$(curl -s https://raw.githubusercontent.com/vllm-project/vllm-gaudi/vllm/last-good-commit-for-vllm-gaudi/VLLM_STABLE_COMMIT | tr -d '\n') && \ 
        
               echo "Fetched stable commit: $VLLM_STABLE_COMMIT" && \ 
        
               VLLM_COMMIT_TO_USE=${VLLM_COMMIT:-$VLLM_STABLE_COMMIT} && \ 
        
               echo "Using vLLM commit: $VLLM_COMMIT_TO_USE" && \ 
        
               mkdir -p $VLLM_PATH && \ 
        
               git clone https://github.com/vllm-project/vllm.git $VLLM_PATH && \ 
        
               cd $VLLM_PATH && \ 
        
               git remote add upstream https://github.com/vllm-project/vllm.git && \ 
        
               git fetch upstream --tags || true && \ 
        
               git checkout $VLLM_COMMIT_TO_USE && \ 
        
               bash -c "pip install -r <(sed '/^[torch]/d' requirements/build.txt)" && \

Only filter torch requirements in HPU Dockerfile

The sed '/^[torch]/d' used when installing build requirements deletes any line whose first character is t, o, r, c or h, not just the torch entries. This means packages such as cmake, regex, and other dependencies are skipped during the image build, making the subsequent pip install of vLLM likely to fail due to missing build tools. The expression should target the whole word (e.g. sed '/^torch/d') or explicitly list the packages to omit so other requirements remain installed.

vllm/.buildkite/nightly-benchmarks/scripts/run-performance-benchmarks.sh

Lines 140 to 151 in 41c2a72

    
           # wait until GPU memory usage smaller than 1GB 
        
           if command -v nvidia-smi; then 
        
             while [ "$(nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits | head -n 1)" -ge 1000 ]; do 
        
               sleep 1 
        
             done 
        
           elif command -v amd-smi; then 
        
             while [ "$(amd-smi metric -g 0 | grep 'USED_VRAM' | awk '{print $2}')" -ge 1000 ]; do 
        
               sleep 1 
        
             done 
        
           elif command -v hl-smi; then 
        
             while [ "$(hl-smi -q  | grep "Used" |  head -n 1 | awk -F ' ' '{print $3}')" -ge 1000 ]; do 
        
               sleep 1

HPU memory drain loop parses non‑numeric output

In the new HL‑SMI branch of kill_gpu_processes, the expression hl-smi -q | grep "Used" | head -n 1 | awk -F ' ' '{print $3}' typically returns a colon or unit (e.g. the third field of "Used Memory : 1234 MB" is :), so the numeric comparison -ge 1000 raises [: integer expression expected and the loop exits immediately. As a result the script can proceed before HPU memory has been released, causing flakiness in subsequent benchmarks. The parsing should extract the numeric column (e.g. split on ':' and then take the numeric token) before comparing.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

docker/Dockerfile.hpu

.buildkite/performance-benchmarks/tests/serving-tests-hpu.json

docker/Dockerfile.hpu

mergify · 2025-10-30T09:21:28Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jakub-sochacki.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>

xuechendi · 2025-10-30T14:38:02Z

@DarkLight1337 @khluu @jikunshang , may you help to review and merge, meanwhile,

there are other two PRs related to this one
Ci-infra vllm-project/ci-infra#191
Pytorch integration pytorch/pytorch-integration-testing#94

…lm-project#26919) Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>

mergify bot added ci/build performance Performance-related issues labels Oct 15, 2025

gemini-code-assist bot reviewed Oct 15, 2025

View reviewed changes

.buildkite/nightly-benchmarks/scripts/run-performance-benchmarks.sh Outdated Show resolved Hide resolved

jakub-sochacki force-pushed the gaudi-benchmarks branch from 41c2a72 to 2e2bcce Compare October 15, 2025 15:53

louie-tsai reviewed Oct 15, 2025

View reviewed changes

docker/Dockerfile.hpu Outdated Show resolved Hide resolved

.buildkite/performance-benchmarks/tests/serving-tests-hpu.json Show resolved Hide resolved

jakub-sochacki force-pushed the gaudi-benchmarks branch from 411b5f2 to 93bd003 Compare October 17, 2025 11:57

louie-tsai approved these changes Oct 24, 2025

View reviewed changes

PatrykWo reviewed Oct 29, 2025

View reviewed changes

docker/Dockerfile.hpu Outdated Show resolved Hide resolved

mergify bot added the needs-rebase label Oct 30, 2025

jakub-sochacki force-pushed the gaudi-benchmarks branch from b87e940 to c63a5b0 Compare October 30, 2025 09:58

jakub-sochacki added 10 commits October 30, 2025 12:13

Enable performance benchmarks for Intel Gaudi 3 - configs

0574518

Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>

Enable performance benchmarks for Intel Gaudi 3 - docker

83ec344

Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>

fix dataset path patameter

c320833

Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>

Use VLLM_CONTIGUOUS_PA

d7b69de

Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>

Use last stable vllm for vllm-gaudi main

3765741

Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>

change vllm path

1fae646

Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>

fix requirements installation in HPU dockerfile

24cca5b

Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>

refactor: clean up hl-smi command in kill_gpu_processes

a032581

Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>

remove hpu dockerfile

5ee4aae

Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>

Remove Dockerfile.hpu - using vllm-gaudi Dockerfile instead

135458f

Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>

jakub-sochacki force-pushed the gaudi-benchmarks branch from c63a5b0 to 135458f Compare October 30, 2025 10:31

Merge branch 'main' into gaudi-benchmarks

0501ebc

mergify bot removed the needs-rebase label Oct 30, 2025

jikunshang added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 30, 2025

jikunshang approved these changes Oct 30, 2025

View reviewed changes

xuechendi mentioned this pull request Oct 30, 2025

[CI/Build][Intel] Add HPU image build with vllm-gaudi compatibility vllm-project/ci-infra#191

Merged

jikunshang merged commit 697f507 into vllm-project:main Oct 30, 2025
20 checks passed

ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025

[CI/Build][Intel] Enable performance benchmarks for Intel Gaudi 3 (vl…

97421af

…lm-project#26919) Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[CI/Build][Intel] Enable performance benchmarks for Intel Gaudi 3 (vl…

3ebe739

…lm-project#26919) Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[CI/Build][Intel] Enable performance benchmarks for Intel Gaudi 3 #26919

[CI/Build][Intel] Enable performance benchmarks for Intel Gaudi 3 #26919

Uh oh!

jakub-sochacki commented Oct 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Oct 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot commented Oct 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Oct 30, 2025

Uh oh!

xuechendi commented Oct 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

[CI/Build][Intel] Enable performance benchmarks for Intel Gaudi 3 #26919

[CI/Build][Intel] Enable performance benchmarks for Intel Gaudi 3 #26919

Uh oh!

Conversation

jakub-sochacki commented Oct 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Oct 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot commented Oct 15, 2025

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Oct 30, 2025

Uh oh!

xuechendi commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jakub-sochacki commented Oct 15, 2025 •

edited by github-actions bot

Loading

xuechendi commented Oct 30, 2025 •

edited

Loading