-
-
Notifications
You must be signed in to change notification settings - Fork 11k
Description
Motivation.
How vLLM decides to cache torch.compile compilations is brittle. There's a list of configs that it takes into account and hashes, if any of these configs change then vLLM decides that it needs to do a fresh torch.compile run.
As we saw in #16491, it's very easy to add a new feature to one of the configs and forget to update the hash function. In that PR, the problem was that ModelConfig's hash function did not take into account everything that could change the compilation.
Proposed Change.
The hash functions are currently opt-in: when someone adds a new feature or does a refactor they may need to add something to the hash functions. After discussion with the PyTorch Compiler team (cc @oulgen), we instead propose changing the hash functions to be opt-out.
What that means is that ModelConfig's compute_hash function instead contains a list of fields that it should not include in the hash:
def compute_hash(self):
factors = list(self.__dict__.values())
factors.remove("enforce_eager")
factors.remove("tokenizer_config")
...Opt-out seems safer. The risk of incorrect caching is (1) model errors unexpectedly and (2) silent incorrectness, so we think it's better to be more conservative.
Feedback Period.
EOD Friday, April 18th, 2025.
CC List.
@youkaichao @tlrmchlsmth @mgoin @ProExpertProg @houseroad
Any Other Things.
thank you
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status