-
-
Notifications
You must be signed in to change notification settings - Fork 11k
Fix vLLM x torch.compile config caching #16491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
e6b77a4 to
355bcbd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for the fix!
vllm/config.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually do we need to put force eager here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I will remove it (and let me know if I should remove more things, I don't know what a lot of these flags do)
355bcbd to
5e12012
Compare
vllm/config.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will it potentially contain object id? it's safer if we can convert to some serialization format like json, then we can safely take the whole config into consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it looks like this can contain object id. So we'll need to figure out how to serialize it...
I'm going to make a pass through this PR and remove the items that need object id for now. We only needed one of the config flags set to fix the issue I was looking at
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nvm, hf_config is guaranteed to be serializable to json: https://github.com/huggingface/transformers/blob/main/src/transformers/configuration_utils.py#L919 , and it is common for people to save/load it from json.
So what we're doing here is fine.
I will push to_json_string() into the factors list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good change overall and agree with the related RFC - better to be safe than sorry here so not overly scrutinizing the additions
Fixes vllm-project#16150 Based on the ModelConfig, we decide if we can reuse an existing torch.compile'd artifact or if we need to recompile. Unfortunately we were not checking enough flags on the config. The problem in vllm-project#16150 was specifically that if the override_generation_config flag changed then we need to recompile. I went through ModelConfig and I added some more things to be checked for if a model needs to recompile. Disclaimer: I do not know what a lot of these things to do, but I figure that it is better to add things than not (we risk silent incorrectness if the caching is wrong). We can remove more things if we are compiling too much. This is also one of the reasons the PyTorch Team recommend that vLLM use torch.compile's built-in caching (when we improve it), because torch.compile programmatically decides what needs to be cached and we test that really well. Test Plan: - tested locally Signed-off-by: rzou <zou3519@gmail.com>
5e12012 to
db35f17
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be safe to land.
Could you paste the test plan to the PR description?
|
try to re-trigger the pre-commit CI. |
| getattr(self.hf_config, "max_position_embeddings", "None")) | ||
| # hf_config can control how the model looks! | ||
| factors.append(self.hf_config.to_json_string()) | ||
| return hashlib.sha256(str(factors).encode()).hexdigest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the late feedback, can we add one more assert to make sure str(factors) do not contain object ids? like searching for object at 0x. not sure how stable python object representations are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that. @youkaichao Were you thinking an assert (hard error) or a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be an assert (hard error). we should make sure object id does not occur, and when it occurs, we should be aware of it and let users report so that we can investigate.
Signed-off-by: rzou <zou3519@gmail.com> Signed-off-by: Yang Wang <elainewy@meta.com>
Signed-off-by: rzou <zou3519@gmail.com>
Signed-off-by: rzou <zou3519@gmail.com>
Signed-off-by: rzou <zou3519@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Fixes #16150
Based on the ModelConfig, we decide if we can reuse an existing torch.compile'd artifact or if we need to recompile. Unfortunately we were not checking enough flags on the config, so if one runs
vllm serveenough times with different arguments, it leads to weird errors (like in the issue).The problem in #16150 was specifically that if the override_generation_config flag changed then we need to recompile.
I went through ModelConfig and I added some more things to be checked for if a model needs to recompile. Disclaimer: I do not know what a lot of these things to do, but I figure that it is better to add things than not (we risk silent incorrectness if the caching is wrong). We can remove more things if we are compiling too much.
This is also one of the reasons the PyTorch Team recommended that vLLM use torch.compile's built-in caching (we want to try to migrate vLLM to using this in the long term), because torch.compile programmatically decides what needs to be cached and that is tested really well.
Test Plan:
rm -rf ~/.cache/vllmvllm serve "meta-llama/Llama-4-Scout-17B-16E-Instruct" -tp 8 --max_ model_len 1000vllm serve "meta-llama/Llama-4-Scout-17B-16E-Instruct" -tp 8 --max_ model_len 1000 --override-generation-config='{"attn_temperature_tuning": true}', note that there is no error and that vLLM decided to do a new compilation rather than reuse an existing artifact