Skip to content

Conversation

@jakub-sochacki
Copy link
Contributor

@jakub-sochacki jakub-sochacki commented Oct 15, 2025

Purpose

Enable Intel Gaudi 3 Accelerator for vLLM Benchmark suite for performance benchmarking.

  • New Dockerfile.hpu with dynamic vllm-gaudi compatibility (fetches compatible vLLM commit from VLLM_STABLE_COMMIT)
  • Added latency, throughput, and serving test suites for Llama 3.1 (8B, 70B) and Mixtral 8x7B with HPU-specific optimizations
  • Enabled Intel Gaudi 3 executions in run-performance-benchmarks.sh with HPU device detection (hl-smi) and -hpu architecture suffix

Test Plan

Models tested: Llama 3.1-8B (TP1), Llama 3.1-70B (TP4), Mixtral 8x7B (TP2)
Scenarios: throughput, latency and serving

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@mergify mergify bot added ci/build performance Performance-related issues labels Oct 15, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables performance benchmarks for Intel Gaudi 3 by adding a new Dockerfile, updating the benchmark script to detect Gaudi devices, and including new test configurations. The changes are well-structured and mostly look good. I've found one potential bug in the benchmark script where a command to check memory usage might fail due to incorrect parsing of the command output. This could prevent the script from correctly waiting for resources to be freed. My review includes a specific suggestion to fix this issue.

@chatgpt-codex-connector
Copy link

💡 Codex Review

# Clone the vllm repository and install inside the container
# Dynamically fetch the compatible vLLM commit from vllm-gaudi repo
RUN VLLM_STABLE_COMMIT=$(curl -s https://raw.githubusercontent.com/vllm-project/vllm-gaudi/vllm/last-good-commit-for-vllm-gaudi/VLLM_STABLE_COMMIT | tr -d '\n') && \
echo "Fetched stable commit: $VLLM_STABLE_COMMIT" && \
VLLM_COMMIT_TO_USE=${VLLM_COMMIT:-$VLLM_STABLE_COMMIT} && \
echo "Using vLLM commit: $VLLM_COMMIT_TO_USE" && \
mkdir -p $VLLM_PATH && \
git clone https://github.com/vllm-project/vllm.git $VLLM_PATH && \
cd $VLLM_PATH && \
git remote add upstream https://github.com/vllm-project/vllm.git && \
git fetch upstream --tags || true && \
git checkout $VLLM_COMMIT_TO_USE && \
bash -c "pip install -r <(sed '/^[torch]/d' requirements/build.txt)" && \

P1 Badge Only filter torch requirements in HPU Dockerfile

The sed '/^[torch]/d' used when installing build requirements deletes any line whose first character is t, o, r, c or h, not just the torch entries. This means packages such as cmake, regex, and other dependencies are skipped during the image build, making the subsequent pip install of vLLM likely to fail due to missing build tools. The expression should target the whole word (e.g. sed '/^torch/d') or explicitly list the packages to omit so other requirements remain installed.


# wait until GPU memory usage smaller than 1GB
if command -v nvidia-smi; then
while [ "$(nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits | head -n 1)" -ge 1000 ]; do
sleep 1
done
elif command -v amd-smi; then
while [ "$(amd-smi metric -g 0 | grep 'USED_VRAM' | awk '{print $2}')" -ge 1000 ]; do
sleep 1
done
elif command -v hl-smi; then
while [ "$(hl-smi -q | grep "Used" | head -n 1 | awk -F ' ' '{print $3}')" -ge 1000 ]; do
sleep 1

P1 Badge HPU memory drain loop parses non‑numeric output

In the new HL‑SMI branch of kill_gpu_processes, the expression hl-smi -q | grep "Used" | head -n 1 | awk -F ' ' '{print $3}' typically returns a colon or unit (e.g. the third field of "Used Memory : 1234 MB" is :), so the numeric comparison -ge 1000 raises [: integer expression expected and the loop exits immediately. As a result the script can proceed before HPU memory has been released, causing flakiness in subsequent benchmarks. The parsing should extract the numeric column (e.g. split on ':' and then take the numeric token) before comparing.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

@mergify
Copy link

mergify bot commented Oct 30, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jakub-sochacki.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>
Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>
Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>
Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>
Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>
Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>
Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>
Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>
Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>
Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>
@mergify mergify bot removed the needs-rebase label Oct 30, 2025
@xuechendi
Copy link
Contributor

xuechendi commented Oct 30, 2025

@DarkLight1337 @khluu @jikunshang , may you help to review and merge, meanwhile,

there are other two PRs related to this one
Ci-infra vllm-project/ci-infra#191
Pytorch integration pytorch/pytorch-integration-testing#94

@jikunshang jikunshang added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 30, 2025
@jikunshang jikunshang merged commit 697f507 into vllm-project:main Oct 30, 2025
20 checks passed
ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants