[Optimization] Avoid repeated model architecture conversion for pooling models #25261

DarkLight1337 · 2025-09-19T13:32:29Z

Purpose

When running multi-modal pooling models, I found that almost 3% of the time is taken by the get_model_architecture call inside create_processor. Upon inspection, get_model_architecture converts the model class into a pooling model each time it is called which is quite expensive (since it occurs for every single request), so I have decided to cache its output.

cc @maxdebayser @noooop

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…ng models Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

gemini-code-assist

Code Review

This pull request introduces a cache for get_model_architecture to avoid repeated expensive conversions, which is a good optimization. My main feedback is regarding thread safety. The global cache is modified without a lock, which can lead to race conditions and redundant computations in a multi-threaded environment. I've suggested adding a lock using a double-checked locking pattern to make the caching mechanism thread-safe and robust.

vllm/model_executor/model_loader/utils.py

maxdebayser

Yeah, I noticed too that it was being called repeatedly. LGTM, just a quick question though, would this work with functools cache?

DarkLight1337 · 2025-09-19T16:19:30Z

No, since ModelConfig itself is not hashable directly

maxdebayser · 2025-09-19T16:52:13Z

But if ModelConfig already has a compute_hash() function it should be possible to implement __hash__ using it.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 · 2025-09-19T17:07:28Z

But if ModelConfig already has a compute_hash() function it should be possible to implement __hash__ using it.

That is true, but compute_hash only takes into account the fields that affect CUDA graph compilation. I don't want to implement __hash__ using this as that would violate the hash-equals contract.

DarkLight1337 · 2025-09-19T18:35:50Z

Hmm... looks like this PR uncovered some hidden problems about EAGLE config validity

DarkLight1337 · 2025-09-19T18:38:34Z

Any idea? @wwl2755 @WoosukKwon

wwl2755 · 2025-09-19T23:24:08Z

I think it is because EAGLEConfig() won't expect model=None is their use case. The use case is something like:

eagle_config = EAGLEConfig(
                            self.draft_model_config.hf_config, # model
                            method=self.method,
                            model_type="eagle")

So it makes a problem when compute_hash() try to use the default initialization via EAGLEConfig().

An easy fix is to delete the assertion and make a dummy kwargs["architectures"] = [] to cooperate the compute_hash(). IMO, it won't affect the normal behavior as it will always pass something in model parameter. But not 100% sure whether it will break any potential consensus other where. Leave it to @WoosukKwon for the decision call.

DarkLight1337 · 2025-09-20T00:55:17Z

~~Hmm actually the test runs successfully on its own. Maybe the config is somehow modified in place by other tests, let me take a look~~ Nvm I was running the test on the wrong branch 😅

DarkLight1337 · 2025-09-20T01:10:28Z

Ok I just realized that to_json_string only returns diffs by default which causes EAGLEConfig to be instantiated without model. Let's see if use_diff=False can solve the problem

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

wwl2755 · 2025-09-20T05:40:36Z

Let's see if use_diff=False can solve the problem

Great catch! Solve the problem once for all.

…ng models (vllm-project#25261) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

…ng models (vllm-project#25261) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: charlifu <charlifu@amd.com>

…ng models (#25261) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: yewentao256 <zhyanwentao@126.com>

…ng models (vllm-project#25261) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

…ng models (vllm-project#25261) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

…ng models (vllm-project#25261) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

[Optimization] Avoid repeated model architecture conversion for pooli…

984ede2

…ng models Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 requested a review from Isotr0py September 19, 2025 13:32

DarkLight1337 requested a review from 22quinn as a code owner September 19, 2025 13:32

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 19, 2025

gemini-code-assist bot reviewed Sep 19, 2025

View reviewed changes

vllm/model_executor/model_loader/utils.py Show resolved Hide resolved

maxdebayser approved these changes Sep 19, 2025

View reviewed changes

Merge branch 'main' into avoid-mm-pooling-convert

92f84a9

Isotr0py approved these changes Sep 19, 2025

View reviewed changes

Fix hash

4a924c5

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256 and youkaichao as code owners September 19, 2025 17:06

Set use_diff

7f167c1

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 merged commit c60e613 into vllm-project:main Sep 20, 2025
42 checks passed

DarkLight1337 deleted the avoid-mm-pooling-convert branch September 20, 2025 05:30

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Optimization] Avoid repeated model architecture conversion for pooli…

098aaa5

…ng models (vllm-project#25261) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

[Optimization] Avoid repeated model architecture conversion for pooli…

8f3edbd

…ng models (#25261) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: yewentao256 <zhyanwentao@126.com>

choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025

[Optimization] Avoid repeated model architecture conversion for pooli…

7a60332

…ng models (vllm-project#25261) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Optimization] Avoid repeated model architecture conversion for pooli…

68f36c3

…ng models (vllm-project#25261) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

noooop mentioned this pull request Oct 21, 2025

[Bug]: Numerics of Embedding Models #22862

Closed

1 task

Uh oh!

[Optimization] Avoid repeated model architecture conversion for pooling models #25261

[Optimization] Avoid repeated model architecture conversion for pooling models #25261

Uh oh!

Conversation

DarkLight1337 commented Sep 19, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

maxdebayser left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Sep 19, 2025

Uh oh!

maxdebayser commented Sep 19, 2025

Uh oh!

DarkLight1337 commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Sep 19, 2025

Uh oh!

wwl2755 commented Sep 19, 2025

Uh oh!

DarkLight1337 commented Sep 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Sep 20, 2025

Uh oh!

Uh oh!

wwl2755 commented Sep 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

DarkLight1337 commented Sep 19, 2025 •

edited by github-actions bot

Loading

DarkLight1337 commented Sep 19, 2025 •

edited

Loading

DarkLight1337 commented Sep 19, 2025 •

edited

Loading

DarkLight1337 commented Sep 20, 2025 •

edited

Loading