-
Notifications
You must be signed in to change notification settings - Fork 45
[CI/Build][Intel] Add HPU image build with vllm-gaudi compatibility #191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
68db113 to
2846d06
Compare
louie-tsai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me. thanks
buildkite/test-template-ci.j2
Outdated
| - | | ||
| #!/bin/bash | ||
| # Fetch the compatible vLLM commit for vllm-gaudi | ||
| VLLM_STABLE_COMMIT=$(curl -s https://raw.githubusercontent.com/vllm-project/vllm-gaudi/main/last-good-commit-for-vllm-gaudi/VLLM_STABLE_COMMIT | tr -d '\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just used the wrong branch (main/last-good-commit-for-vllm-gaudi) while it should be vllm/last-good-commit-for-vllm-gaudi. Is since I prefer not to clone the repo here i will update the command (URL) to:
VLLM_STABLE_COMMIT=$(curl -s https://raw.githubusercontent.com/vllm-project/vllm-gaudi/vllm/last-good-commit-for-vllm-gaudi/VLLM_STABLE_COMMIT | tr -d '\n')
2846d06 to
deb205f
Compare
Signed-off-by: jakub-sochacki <jakub.sochacki@intel.com>
Signed-off-by: jakub-sochacki <jakub.sochacki@intel.com>
Signed-off-by: jakub-sochacki <jakub.sochacki@intel.com>
Signed-off-by: jakub-sochacki <jakub.sochacki@intel.com>
Signed-off-by: jakub-sochacki <jakub.sochacki@intel.com>
8268810 to
bb802f2
Compare
|
@khluu , may you help with this PR, thanks so much |
| queue: cpu_queue_postmerge_us_east_1 | ||
| commands: | ||
| - "aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7" | ||
| - | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add this script into vllm repo and call it here instead of sending the whole script as part of commands?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, PR is submitted to vllm repo as well:
vllm-project/vllm#26919
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the script in vllm-project/vllm#26919
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, which part you want us move to vllm-project?
Because hpu is a plugin, so we use below line to build our docker, is this part you suggested to be part of vllm?
# Fetch the compatible vLLM commit for vllm-gaudi
VLLM_STABLE_COMMIT=$(curl -s https://raw.githubusercontent.com/vllm-project/vllm-gaudi/vllm/last-good-commit-for-vllm-gaudi/VLLM_STABLE_COMMIT | tr -d '\n')
git clone https://github.com/vllm-project/vllm-gaudi.git /tmp/vllm-gaudi
docker build \
--file /tmp/vllm-gaudi/tests/pytorch_ci_hud_benchmark/Dockerfile.hpu \
--build-arg max_jobs=16 \
--build-arg VLLM_COMMIT=$VLLM_STABLE_COMMIT \
--build-arg VLLM_GAUDI_COMMIT=main \
--tag "$HPU_IMAGE_TAG" \
--progress plain .
The vllm_stable_commit is necessary because some latest commit might failed vllm-gaudi,. So we are tracking the last good vllm commit sha
and rest part is to use Dockerfile.hpu exiting in vllm-gaudi instead of vllm to build docker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh what I mean is can we store the bash script in the command into a file on vllm-project/vllm repo, then just call that script here, like https://github.com/vllm-project/ci-infra/blob/main/buildkite/test-template-ci.j2#L722
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khluu I understand that scripts like https://github.com/vllm-project/vllm/blob/main/.buildkite/scripts/hardware_ci/run-xpu-test.sh are to running in the CI. Our goal is to build images in the CI but run performance benchmarks in nightly / 12h-cadence. Do you suggest moving docker building to separate script in vllm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khluu , I discussed with Jakub.
Can we merge as it is now for this PR, the Pytorch-integration PR is merged. so we want to see how it goes, in case we might need more fix to clear the path.
=> The reason we have to do it now, is in the Pytorch-integration PR, we also use VLLM_STABLE_COMMIT as part of image name to index, that is why in this PR, we need to use the commit_id to tag image.
In next PR,
we will remove the whole VLLM_STABLE_COMMIT thing, and directly using BUILDKITE_COMMIT, and also submit another pytorch-integration PR to use BUILDKITE_COMMIT to index hpu_docker_image there.
So the HPU docker build can be simplified with single step which is to build docker from vllm-gaudi / Dockerfile.hpu
|
@jakub-sochacki , please help to rebase |
| docker build \ | ||
| --file /tmp/vllm-gaudi/tests/pytorch_ci_hud_benchmark/Dockerfile.hpu \ | ||
| --build-arg max_jobs=16 \ | ||
| --build-arg VLLM_COMMIT=$VLLM_STABLE_COMMIT \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakub-sochacki, does VLLM_COMMIT needed here, I realized you have add same step in the Dockerfile, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ensures that the same vllm commit will be used in the dockerfile and in the docker image tag HPU_IMAGE_TAG="${REGISTRY}:${VLLM_STABLE_COMMIT}-hpu"
Its already rebased and up-to-date. |

This enables nightly HPU benchmarks by ensuring Docker images are built with vLLM versions compatible with the vllm-gaudi plugin, addressing version synchronization issues between the two repositories.