-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[Frontend] Use engine argument to control MM cache size #22441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively consolidates the multi-modal cache size configuration into a single mm_processor_cache_gb engine argument, deprecating the older disable_mm_preprocessor_cache and VLLM_MM_INPUT_CACHE_GIB settings. The changes are well-propagated through documentation, examples, and tests. I've identified one correctness issue in the implementation that should be addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to deduplicate this config so it doesn't have to live in ModelConfig and MultiModalConfig in the same way that disable_mm_preprocessor_cache did?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which config are you referring to here? Since disable-mm-preprocessor-cache will be removed anyway, no point in refactoring that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mm_processor_cache_gb is a field of both ModelConfig and MultiModalConfig.
Can it just be a field of MultiModalConfig and anywhere we have model_config.mm_processor_cache_gb changes to model_config.multimodal_config.mm_processor_cache_gb?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently initialize MultiModalConfig inside ModelConfig, so ModelConfig needs to accept all of the arguments that are forwarded to MultiModalConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use https://docs.python.org/3/library/dataclasses.html#dataclasses.InitVar so that mm_processor_cache_gb can be accessed in __post_init__ without making it a field of ModelConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm let's do this in a separate PR since it would also apply to other fields of MultiModalConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea
| "--mm-processor-cache-gb", | ||
| **multimodal_kwargs["mm_processor_cache_gb"]) | ||
| multimodal_group.add_argument("--disable-mm-preprocessor-cache", | ||
| type=bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way it will default to None and you can check if self.disable_mm_preprocessor_cache is not None below, allowing you to warn if the user manually sets it to False too
| type=bool, | |
| type=argparse.BooleanOptionalAction, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…#22441) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
…#22441) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Noam Gat <noamgat@gmail.com>
…#22441) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Paul Pak <paulpak58@gmail.com>
…#22441) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
…#22441) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…#22441) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…#22441) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Xiao Yu <xiao.yu@amd.com>
…#22441) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
Currently we have two separate options to control MM processing cache usage:
disable_mm_preprocessor_cacheengine argumentVLLM_MM_INPUT_CACHE_GIBenvironment variableThis PR consolidates this into a single engine argument,
mm_processor_cache_gb, to simplify the user setup. Accordingly, the following options have been deprecated and will be removed in v0.13:disable_mm_preprocessor_cache=Trueis the same asmm_processor_cache_gb=0VLLM_MM_INPUT_CACHE_GIB=XXXis the same asmm_processor_cache_gb=XXXTest Plan
Test Result
(Optional) Documentation Update
Updated docs accordingly.