Fix vLLM x torch.compile config caching #16491

zou3519 · 2025-04-11T14:57:08Z

Based on the ModelConfig, we decide if we can reuse an existing torch.compile'd artifact or if we need to recompile. Unfortunately we were not checking enough flags on the config, so if one runs vllm serve enough times with different arguments, it leads to weird errors (like in the issue).

The problem in #16150 was specifically that if the override_generation_config flag changed then we need to recompile.

I went through ModelConfig and I added some more things to be checked for if a model needs to recompile. Disclaimer: I do not know what a lot of these things to do, but I figure that it is better to add things than not (we risk silent incorrectness if the caching is wrong). We can remove more things if we are compiling too much.

This is also one of the reasons the PyTorch Team recommended that vLLM use torch.compile's built-in caching (we want to try to migrate vLLM to using this in the long term), because torch.compile programmatically decides what needs to be cached and that is tested really well.

Test Plan:

rm -rf ~/.cache/vllm
run vllm serve "meta-llama/Llama-4-Scout-17B-16E-Instruct" -tp 8 --max_ model_len 1000
run vllm serve "meta-llama/Llama-4-Scout-17B-16E-Instruct" -tp 8 --max_ model_len 1000 --override-generation-config='{"attn_temperature_tuning": true}', note that there is no error and that vLLM decided to do a new compilation rather than reuse an existing artifact

github-actions · 2025-04-11T14:57:18Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

houseroad

Looks good, thanks for the fix!

houseroad · 2025-04-11T15:59:17Z

vllm/config.py

Actually do we need to put force eager here?

Good point, I will remove it (and let me know if I should remove more things, I don't know what a lot of these flags do)

youkaichao · 2025-04-13T08:11:45Z

vllm/config.py

will it potentially contain object id? it's safer if we can convert to some serialization format like json, then we can safely take the whole config into consideration.

Yes, it looks like this can contain object id. So we'll need to figure out how to serialize it...

I'm going to make a pass through this PR and remove the items that need object id for now. We only needed one of the config flags set to fix the issue I was looking at

Nvm, hf_config is guaranteed to be serializable to json: https://github.com/huggingface/transformers/blob/main/src/transformers/configuration_utils.py#L919 , and it is common for people to save/load it from json.

So what we're doing here is fine.

I will push to_json_string() into the factors list.

tlrmchlsmth

Good change overall and agree with the related RFC - better to be safe than sorry here so not overly scrutinizing the additions

Fixes vllm-project#16150 Based on the ModelConfig, we decide if we can reuse an existing torch.compile'd artifact or if we need to recompile. Unfortunately we were not checking enough flags on the config. The problem in vllm-project#16150 was specifically that if the override_generation_config flag changed then we need to recompile. I went through ModelConfig and I added some more things to be checked for if a model needs to recompile. Disclaimer: I do not know what a lot of these things to do, but I figure that it is better to add things than not (we risk silent incorrectness if the caching is wrong). We can remove more things if we are compiling too much. This is also one of the reasons the PyTorch Team recommend that vLLM use torch.compile's built-in caching (when we improve it), because torch.compile programmatically decides what needs to be cached and we test that really well. Test Plan: - tested locally Signed-off-by: rzou <zou3519@gmail.com>

houseroad

Should be safe to land.

Could you paste the test plan to the PR description?

houseroad · 2025-04-15T00:41:59Z

try to re-trigger the pre-commit CI.

youkaichao · 2025-04-21T13:30:02Z

vllm/config.py

-            getattr(self.hf_config, "max_position_embeddings", "None"))
+        # hf_config can control how the model looks!
+        factors.append(self.hf_config.to_json_string())
        return hashlib.sha256(str(factors).encode()).hexdigest()


sorry for the late feedback, can we add one more assert to make sure str(factors) do not contain object ids? like searching for object at 0x. not sure how stable python object representations are.

I can do that. @youkaichao Were you thinking an assert (hard error) or a warning?

I think it should be an assert (hard error). we should make sure object id does not occur, and when it occurs, we should be aware of it and let users report so that we can investigate.

Signed-off-by: rzou <zou3519@gmail.com> Signed-off-by: Yang Wang <elainewy@meta.com>

Signed-off-by: rzou <zou3519@gmail.com>

Signed-off-by: rzou <zou3519@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

zou3519 force-pushed the fix_config_hash branch 3 times, most recently from e6b77a4 to 355bcbd Compare April 11, 2025 15:04

houseroad approved these changes Apr 11, 2025

View reviewed changes

houseroad requested changes Apr 11, 2025

View reviewed changes

zou3519 force-pushed the fix_config_hash branch from 355bcbd to 5e12012 Compare April 11, 2025 16:27

zou3519 requested review from houseroad, mgoin, tlrmchlsmth and youkaichao April 11, 2025 16:27

zou3519 marked this pull request as ready for review April 11, 2025 16:28

zou3519 mentioned this pull request Apr 11, 2025

[RFC]: vLLM x torch.compile caching should be opt-out by default #16501

Open

1 task

youkaichao reviewed Apr 13, 2025

View reviewed changes

tlrmchlsmth reviewed Apr 13, 2025

View reviewed changes

zou3519 force-pushed the fix_config_hash branch from 5e12012 to db35f17 Compare April 14, 2025 15:16

zou3519 requested a review from youkaichao April 14, 2025 15:16

houseroad approved these changes Apr 14, 2025

View reviewed changes

zou3519 added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 14, 2025

houseroad closed this Apr 15, 2025

houseroad reopened this Apr 15, 2025

tlrmchlsmth approved these changes Apr 15, 2025

View reviewed changes

vllm-bot merged commit b590adf into vllm-project:main Apr 15, 2025
57 of 59 checks passed

youkaichao reviewed Apr 21, 2025

View reviewed changes

yangw-dev pushed a commit to yangw-dev/vllm that referenced this pull request Apr 21, 2025

Fix vLLM x torch.compile config caching (vllm-project#16491)

55063d2

Signed-off-by: rzou <zou3519@gmail.com> Signed-off-by: Yang Wang <elainewy@meta.com>

jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Apr 29, 2025

Fix vLLM x torch.compile config caching (vllm-project#16491)

9130bee

Signed-off-by: rzou <zou3519@gmail.com>

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

Fix vLLM x torch.compile config caching (vllm-project#16491)

7cd5b9a

Signed-off-by: rzou <zou3519@gmail.com>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

Fix vLLM x torch.compile config caching (vllm-project#16491)

65d4239

Signed-off-by: rzou <zou3519@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

ckhordiasma mentioned this pull request May 14, 2025

nm vllm ent 0.8.5 sync red-hat-data-services/vllm#139

Merged

Uh oh!

Fix vLLM x torch.compile config caching #16491

Fix vLLM x torch.compile config caching #16491

Uh oh!

Conversation

zou3519 commented Apr 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 11, 2025

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

houseroad Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

zou3519 Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

youkaichao Apr 13, 2025

Choose a reason for hiding this comment

Uh oh!

zou3519 Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

zou3519 Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

tlrmchlsmth left a comment

Choose a reason for hiding this comment

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

houseroad commented Apr 15, 2025

Uh oh!

Uh oh!

youkaichao Apr 21, 2025

Choose a reason for hiding this comment

Uh oh!

zou3519 Apr 21, 2025

Choose a reason for hiding this comment

Uh oh!

youkaichao Apr 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zou3519 commented Apr 11, 2025 •

edited by github-actions bot

Loading

zou3519 Apr 11, 2025 •

edited

Loading