Skip to content

[RFC]: vLLM x torch.compile caching should be opt-out by default #16501

@zou3519

Description

@zou3519

Motivation.

How vLLM decides to cache torch.compile compilations is brittle. There's a list of configs that it takes into account and hashes, if any of these configs change then vLLM decides that it needs to do a fresh torch.compile run.

As we saw in #16491, it's very easy to add a new feature to one of the configs and forget to update the hash function. In that PR, the problem was that ModelConfig's hash function did not take into account everything that could change the compilation.

Proposed Change.

The hash functions are currently opt-in: when someone adds a new feature or does a refactor they may need to add something to the hash functions. After discussion with the PyTorch Compiler team (cc @oulgen), we instead propose changing the hash functions to be opt-out.

What that means is that ModelConfig's compute_hash function instead contains a list of fields that it should not include in the hash:

def compute_hash(self):
   factors = list(self.__dict__.values())
   factors.remove("enforce_eager")
   factors.remove("tokenizer_config")
   ...

Opt-out seems safer. The risk of incorrect caching is (1) model errors unexpectedly and (2) silent incorrectness, so we think it's better to be more conservative.

Feedback Period.

EOD Friday, April 18th, 2025.

CC List.

@youkaichao @tlrmchlsmth @mgoin @ProExpertProg @houseroad

Any Other Things.

thank you

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

Type

No type

Projects

Status

In review

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions