-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
Feature/benchmark/random mm data/images #23119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/benchmark/random mm data/images #23119
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new synthetic multimodal benchmark dataset, RandomMultiModalDataset
, and refactors the existing RandomDataset
to improve modularity and code reuse. The changes are well-organized, with new functionalities encapsulated in separate methods and corresponding command-line arguments added for configuration. My review identifies a potential runtime error where invalid user-provided arguments for image sampling could cause a crash. I've suggested replacing an assertion with a more informative ValueError
to handle this case gracefully, aligning with the error handling practices elsewhere in the file.
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
307c069
to
1323d9d
Compare
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
…oject#20638 Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
67ede37
to
5edd4b6
Compare
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
57691a6
to
adfd529
Compare
…e for old implementation of RandomDataset Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think the code looks good. Can you also perform a benchmark run with real-sized images (around the range of 1024x1024) to see the difference? I usually see much worse throughput and TTFT when using VisionArena dataset, so I am interested in what settings I should use to get similar results using a random dataset.
|
…andomMMData/Images Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
…andomMMData/Images Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
…_items_per_request Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM - but can you fix the formatting issue on README?
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
…andomMMData/Images Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚢
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai> Signed-off-by: Xiao Yu <xiao.yu@amd.com>
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
@h-brenoskuk @DarkLight1337 May I ask which vLLM Docker release has the feature? Thank you! |
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
Purpose
Generate synthetic image inputs alongside random text prompts to stress-test vision models without external datasets.
Notes:
--backend openai-chat
) and endpoint/v1/chat/completions
.--limit-mm-per-prompt
on the server to match your model config.Vary the number of items per request and use multiple image buckets:
Flags specific to
random-mm
:--random-mm-base-items-per-request
: base number of multimodal items per request.--random-mm-num-mm-items-range-ratio
: vary item count uniformly in the closed integer range [floor(n·(1−r)), ceil(n·(1+r))]. Set r=0 to keep it fixed; r=1 allows 0 items.--random-mm-limit-mm-per-prompt
: per-modality hard caps, e.g. '{"image": 3, "video": 0}'.--random-mm-bucket-config
: dict mapping (H, W, T) → probability. Entries with probability 0 are removed; remaining probabilities are renormalized to sum to 1. Use T=1 for images. Set any T>1 (videos) probability to 0 (video sampling not yet supported).Behavioral notes:
How sampling works:
--random-mm-base-items-per-request
and--random-mm-num-mm-items-range-ratio
, then clamp k to at most the sum of per-modality limits.--random-mm-bucket-config
, while tracking how many items of each modality have been added.--random-mm-limit-mm-per-prompt
, all buckets of that modality are excluded and the remaining bucket probabilities are renormalized before continuing.This should be seen as an edge case, and if this behavior can be avoided by setting
--random-mm-limit-mm-per-prompt
to a large number. Note that this might result in errors due to engine config--limit-mm-per-prompt
.multi_modal_data
(OpenAI Chat format). Whenrandom-mm
is used with the OpenAI Chat backend, prompts remain text and MM content is attached viamulti_modal_data
.Test Plan
Start server
RandomDataset
refactor and compare with previous implementation:For experiments here we fix:
RandomMultiModalDataset
:We use args above with addition of multimodal args:
On the benchmark front we test the tree cases below:
Test Results (ran on a H100)
RandomDataset
refactor and compare with previous implementation:Before refactor:
After refactor:
RandomMultiModalDataset
:(a) No images:
(b) With fixed number of images and dimension
(c) With variable number images dimensions
(d) With a variable number of image dimensions and images per request
RNG isolation test
We also introduce a small test to verify the robustness of the RNG and reproducibility of the new
RandomDataset
implementation. The test asserts that global RNG should not interfere with the RNG of the Dataset.The previous implementation of RandomDataset fails this while the new one passes.
To do next:
vllm/benchmarks/throughput.py