fix ci issue distributed 4 gpu test #20204

yewentao256 · 2025-06-28T00:21:57Z

Purpose

Fixes #20138

Test

(profile, create kv cache, warmup model) took 44.33 seconds
Adding requests: 100%|█| 200/200 [00:00<00:00, 5003.64i
Adding requests: 100%|█| 200/200 [00:00<00:00, 5111.95i
Processed prompts: 100%|█| 200/200 [00:03<00:00, 64.25i
Processed prompts: 100%|█| 200/200 [00:02<00:00, 66.82i
DP rank 1, Prompt: 'Hello, my name is', Generated text: ' _____. I\'m 26 years old and I love art. I want to do art."'
DP rank 0, Prompt: 'Hello, my name is', Generated text: " _____. I'm 26 years old and I love art. I want"
DP rank 1, Prompt: 'The president of the United States is', Generated text: ' not a dictator. That is not a part of his job description.\n\nThis is the'
DP rank 0, Prompt: 'The president of the United States is', Generated text: ' not a dictator. That is not a part of his job description.\n'
DP rank 1, Prompt: 'The capital of France is', Generated text: ' a major international city and an important financial centre. A favourite with tourists and business travell'
DP rank 0, Prompt: 'The capital of France is', Generated text: ' a major international city and an important financial centre. A favourite with tourists'
DP rank 1, Prompt: 'The future of AI is', Generated text: ' closely tied to the broader field of robotics, with the potential for increasingly sophistic'
DP rank 0, Prompt: 'The future of AI is', Generated text: ' closely tied to the broader field of robotics, with the potential for increasing'
DP rank 1, Prompt: 'Hello, my name is', Generated text: ' Mark. I am 40 years old. I am an artist. I am a musician'
DP rank 0, Prompt: 'Hello, my name is', Generated text: ' Mark. I am 40 years old. I am an artist. I'

Signed-off-by: yewentao256 <zhyanwentao@126.com>

github-actions · 2025-06-28T00:22:03Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Summary of Changes

Hello @yewentao256, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a CI issue encountered during distributed 4-GPU testing by enhancing the data_parallel.py example with new configurable parameters. These parameters allow for more precise control over GPU memory allocation and sequence limits during model warm-up, mitigating potential out-of-memory errors and improving the stability of distributed inference tests.

Highlights

CI Issue Resolution: This pull request aims to fix a continuous integration (CI) issue related to distributed 4-GPU testing, specifically addressing potential memory constraints that might cause test failures.
New Configuration Arguments: I've introduced two new command-line arguments to examples/offline_inference/data_parallel.py: --max-num-seqs and --gpu-memory-utilization. These provide finer control over memory usage during distributed inference.
Memory Management: The --max-num-seqs argument (default 64) allows users to limit the maximum number of sequences used during engine warm-up, which can significantly reduce peak memory consumption. The --gpu-memory-utilization argument (default 0.8) enables setting the fraction of GPU memory vLLM is allowed to allocate, providing more headroom to avoid CUDA OOM errors.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces two new command-line arguments, --max-num-seqs and --gpu-memory-utilization, to the data parallel example script. This provides more control over memory usage, which is intended to fix a CI test failure.

examples/offline_inference/data_parallel.py

Signed-off-by: yewentao256 <zhyanwentao@126.com>

njhill

Thanks @yewentao256!

njhill · 2025-06-28T00:48:19Z

examples/offline_inference/data_parallel.py

+    parser.add_argument(
+        "--max-num-seqs",
+        type=int,
+        default=64,
+        help=("Maximum number of sequences to be processed in a single iteration."),
+    )


Are both of these args required to avoid the OOM? 64 is quite small for batch mode, would be good if we could fix just with the gpu_memory_utilization reduction...

yeah, also seems that we need much more memory during initialization than before. I was about to investigate more into this, but didn't get time to do so. Wondering if @yewentao256 could dig further into this?

Yeah I am happy to dig further, but what is the expected result for this? To reduce the memory usage? But it is kind of like a tradeoff between speed and efficiency I am afraid.
Basically, the original cause of this OOM issue is from #18724, which I think it is reasonable to adopt. @houseroad

njhill · 2025-06-28T00:49:14Z

I unblocked the 4-GPUs test so that we can verify it passes.

DarkLight1337

The test passes so I'm merging this to unblock CI first. Let's fix the underlying issue in another PR.

fix ci issue distributed 4 gpu test

5abe793

Signed-off-by: yewentao256 <zhyanwentao@126.com>

gemini-code-assist bot reviewed Jun 28, 2025

View reviewed changes

mergify bot added the documentation Improvements or additions to documentation label Jun 28, 2025

gemini-code-assist bot reviewed Jun 28, 2025

View reviewed changes

examples/offline_inference/data_parallel.py Outdated Show resolved Hide resolved

examples/offline_inference/data_parallel.py Show resolved Hide resolved

fix doc

0bc44e7

Signed-off-by: yewentao256 <zhyanwentao@126.com>

njhill approved these changes Jun 28, 2025

View reviewed changes

DarkLight1337 approved these changes Jun 28, 2025

View reviewed changes

vllm-bot merged commit d45417b into vllm-project:main Jun 28, 2025
14 checks passed

DarkLight1337 mentioned this pull request Jun 28, 2025

[CI] Temporally Remove DP test for Distributed Tests (4 GPUs) #20153

Closed

yewentao256 deleted the wye-fix-ci-issue-distributed-gpu-test branch June 30, 2025 16:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix ci issue distributed 4 gpu test #20204

fix ci issue distributed 4 gpu test #20204

Uh oh!

yewentao256 commented Jun 28, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jun 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

njhill left a comment

Uh oh!

njhill Jun 28, 2025

Uh oh!

houseroad Jun 28, 2025

Uh oh!

yewentao256 Jun 28, 2025

Uh oh!

njhill commented Jun 28, 2025

Uh oh!

DarkLight1337 left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

fix ci issue distributed 4 gpu test #20204

fix ci issue distributed 4 gpu test #20204

Uh oh!

Conversation

yewentao256 commented Jun 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test

Uh oh!

github-actions bot commented Jun 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

njhill Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

houseroad Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

yewentao256 Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

njhill commented Jun 28, 2025

Uh oh!

DarkLight1337 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yewentao256 commented Jun 28, 2025 •

edited by github-actions bot

Loading

DarkLight1337 left a comment •

edited

Loading