-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
AArch64 CPU Docker pipeline #26931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AArch64 CPU Docker pipeline #26931
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new Buildkite pipeline step to build and publish an AArch64 CPU Docker image. The change correctly uses a multi-platform Dockerfile and a dedicated arm64 agent. However, there is a critical issue with the Docker image tagging strategy. The new arm64 build uses the same tags as the existing x86 build, which will cause a race condition and result in one image overwriting the other. My review includes a suggestion to use architecture-specific tags to resolve this, which is a standard practice for multi-architecture image releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
bd22e4e to
355b559
Compare
45d1999 to
6bcf6c2
Compare
| - "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --tag vllm-ci:build-image --target vllm-openai --progress plain -f docker/Dockerfile.cpu ." | ||
| - "mkdir artifacts" | ||
| - "docker run --rm -v $(pwd)/artifacts:/artifacts_host vllm-ci:build-image bash -c 'cp -r dist /artifacts_host && chmod -R a+rw /artifacts_host'" | ||
| - "bash .buildkite/scripts/upload-wheels.sh" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know what the final wheel name will be and if it might clobber the existing arm CUDA wheel? I'm not sure what we do for x86 cpu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi! the CUDA wheel should mention "+cu..." in its name, while the CPU only builds leave that out. I've tested the workflow locally and it omits the CUDA identifier and uses ".cpu-" instead.
| DOCKER_BUILDKIT: "1" | ||
|
|
||
| # aarch64 build. | ||
| - label: "Build arm64 CPU wheel" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we differentiating arm64 CUDA wheel from non-CUDA wheel?
What naming convention are you using?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think on pypi I see
vllm-0.11.0-cp38-abi3-manylinux2014_aarch64.whl. for the one that has CUDA dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should see a cpu somewhere in the wheel name if it got built on CPU.
locally, I see: vllm-0.11.0rc2.dev481+g74704d455.cpu-cp310-cp310-linux_aarch64.whl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by the latest release, having "+cu..." part of the wheel name should indicate CUDA dependencies. I've tested the patched workflow locally, dropping sccache mentions, since that only seems to be supported in CI
, and got something similar to what Fadi mentioned.
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>
6bcf6c2 to
947c644
Compare
| ENV MAX_JOBS=${max_jobs} | ||
|
|
||
| ARG USE_SCCACHE | ||
| ARG SCCACHE_DOWNLOAD_URL=https://github.com/mozilla/sccache/releases/download/v0.8.1/sccache-v0.8.1-x86_64-unknown-linux-musl.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for CUDA dependent AArch64 wheel build we're using this cache as well
|
@mgoin do you have any other feedback w.r.t this PR? Are we happy with it as is? |
|
I'm just a little hesitant with respect to not messing up the arm CUDA wheel, but if cpu shows up in the wheel name it should be fine. I think we can merge and validate. Thanks for the reminder ping |
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Purpose
Adding workflow for building CPU AArch64 wheels.
Aims to fix #26017.
Test Plan
The same workflow was gracefully ran locally, on an AArch64 machine.
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.