Skip to content

Conversation

@DarkLight1337
Copy link
Member

@DarkLight1337 DarkLight1337 commented Sep 19, 2025

Purpose

When running multi-modal pooling models, I found that almost 3% of the time is taken by the get_model_architecture call inside create_processor. Upon inspection, get_model_architecture converts the model class into a pooling model each time it is called which is quite expensive (since it occurs for every single request), so I have decided to cache its output.

teacher

cc @maxdebayser @noooop

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…ng models

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 19, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a cache for get_model_architecture to avoid repeated expensive conversions, which is a good optimization. My main feedback is regarding thread safety. The global cache is modified without a lock, which can lead to race conditions and redundant computations in a multi-threaded environment. I've suggested adding a lock using a double-checked locking pattern to make the caching mechanism thread-safe and robust.

Copy link
Contributor

@maxdebayser maxdebayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I noticed too that it was being called repeatedly. LGTM, just a quick question though, would this work with functools cache?

@DarkLight1337
Copy link
Member Author

No, since ModelConfig itself is not hashable directly

@maxdebayser
Copy link
Contributor

But if ModelConfig already has a compute_hash() function it should be possible to implement __hash__ using it.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@DarkLight1337
Copy link
Member Author

DarkLight1337 commented Sep 19, 2025

But if ModelConfig already has a compute_hash() function it should be possible to implement __hash__ using it.

That is true, but compute_hash only takes into account the fields that affect CUDA graph compilation. I don't want to implement __hash__ using this as that would violate the hash-equals contract.

@DarkLight1337
Copy link
Member Author

DarkLight1337 commented Sep 19, 2025

Hmm... looks like this PR uncovered some hidden problems about EAGLE config validity

@DarkLight1337
Copy link
Member Author

Any idea? @wwl2755 @WoosukKwon

@wwl2755
Copy link
Contributor

wwl2755 commented Sep 19, 2025

I think it is because EAGLEConfig() won't expect model=None is their use case. The use case is something like:

eagle_config = EAGLEConfig(
                            self.draft_model_config.hf_config, # model
                            method=self.method,
                            model_type="eagle")

So it makes a problem when compute_hash() try to use the default initialization via EAGLEConfig().

An easy fix is to delete the assertion and make a dummy kwargs["architectures"] = [] to cooperate the compute_hash(). IMO, it won't affect the normal behavior as it will always pass something in model parameter. But not 100% sure whether it will break any potential consensus other where. Leave it to @WoosukKwon for the decision call.

@DarkLight1337
Copy link
Member Author

DarkLight1337 commented Sep 20, 2025

Hmm actually the test runs successfully on its own. Maybe the config is somehow modified in place by other tests, let me take a look Nvm I was running the test on the wrong branch 😅

@DarkLight1337
Copy link
Member Author

Ok I just realized that to_json_string only returns diffs by default which causes EAGLEConfig to be instantiated without model. Let's see if use_diff=False can solve the problem

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@DarkLight1337 DarkLight1337 merged commit c60e613 into vllm-project:main Sep 20, 2025
42 checks passed
@DarkLight1337 DarkLight1337 deleted the avoid-mm-pooling-convert branch September 20, 2025 05:30
@wwl2755
Copy link
Contributor

wwl2755 commented Sep 20, 2025

Let's see if use_diff=False can solve the problem

Great catch! Solve the problem once for all.

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
…ng models (vllm-project#25261)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025
…ng models (vllm-project#25261)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: charlifu <charlifu@amd.com>
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
…ng models (#25261)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…ng models (vllm-project#25261)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
…ng models (vllm-project#25261)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…ng models (vllm-project#25261)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…ng models (vllm-project#25261)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants