Skip to content

Conversation

@zou3519
Copy link
Collaborator

@zou3519 zou3519 commented Apr 11, 2025

Fixes #16150

Based on the ModelConfig, we decide if we can reuse an existing torch.compile'd artifact or if we need to recompile. Unfortunately we were not checking enough flags on the config, so if one runs vllm serve enough times with different arguments, it leads to weird errors (like in the issue).

The problem in #16150 was specifically that if the override_generation_config flag changed then we need to recompile.

I went through ModelConfig and I added some more things to be checked for if a model needs to recompile. Disclaimer: I do not know what a lot of these things to do, but I figure that it is better to add things than not (we risk silent incorrectness if the caching is wrong). We can remove more things if we are compiling too much.

This is also one of the reasons the PyTorch Team recommended that vLLM use torch.compile's built-in caching (we want to try to migrate vLLM to using this in the long term), because torch.compile programmatically decides what needs to be cached and that is tested really well.

Test Plan:

  • rm -rf ~/.cache/vllm
  • run vllm serve "meta-llama/Llama-4-Scout-17B-16E-Instruct" -tp 8 --max_ model_len 1000
  • run vllm serve "meta-llama/Llama-4-Scout-17B-16E-Instruct" -tp 8 --max_ model_len 1000 --override-generation-config='{"attn_temperature_tuning": true}', note that there is no error and that vLLM decided to do a new compilation rather than reuse an existing artifact

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@zou3519 zou3519 force-pushed the fix_config_hash branch 3 times, most recently from e6b77a4 to 355bcbd Compare April 11, 2025 15:04
Copy link
Collaborator

@houseroad houseroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the fix!

vllm/config.py Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually do we need to put force eager here?

Copy link
Collaborator Author

@zou3519 zou3519 Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I will remove it (and let me know if I should remove more things, I don't know what a lot of these flags do)

vllm/config.py Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it potentially contain object id? it's safer if we can convert to some serialization format like json, then we can safely take the whole config into consideration.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it looks like this can contain object id. So we'll need to figure out how to serialize it...

I'm going to make a pass through this PR and remove the items that need object id for now. We only needed one of the config flags set to fix the issue I was looking at

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nvm, hf_config is guaranteed to be serializable to json: https://github.com/huggingface/transformers/blob/main/src/transformers/configuration_utils.py#L919 , and it is common for people to save/load it from json.

So what we're doing here is fine.

I will push to_json_string() into the factors list.

Copy link
Member

@tlrmchlsmth tlrmchlsmth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change overall and agree with the related RFC - better to be safe than sorry here so not overly scrutinizing the additions

Fixes vllm-project#16150

Based on the ModelConfig, we decide if we can reuse an existing
torch.compile'd artifact or if we need to recompile. Unfortunately we
were not checking enough flags on the config.

The problem in vllm-project#16150 was specifically that if the
override_generation_config flag changed then we need to recompile.

I went through ModelConfig and I added some more things to be checked
for if a model needs to recompile. Disclaimer: I do not know what a lot
of these things to do, but I figure that it is better to add things
than not (we risk silent incorrectness if the caching is wrong).
We can remove more things if we are compiling too much.

This is also one of the reasons the PyTorch Team recommend that vLLM use
torch.compile's built-in caching (when we improve it), because torch.compile
programmatically decides what needs to be cached and we test that really
well.

Test Plan:
- tested locally

Signed-off-by: rzou <zou3519@gmail.com>
@zou3519 zou3519 requested a review from youkaichao April 14, 2025 15:16
Copy link
Collaborator

@houseroad houseroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be safe to land.

Could you paste the test plan to the PR description?

@zou3519 zou3519 added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 14, 2025
@houseroad houseroad closed this Apr 15, 2025
@houseroad houseroad reopened this Apr 15, 2025
@houseroad
Copy link
Collaborator

try to re-trigger the pre-commit CI.

@vllm-bot vllm-bot merged commit b590adf into vllm-project:main Apr 15, 2025
57 of 59 checks passed
getattr(self.hf_config, "max_position_embeddings", "None"))
# hf_config can control how the model looks!
factors.append(self.hf_config.to_json_string())
return hashlib.sha256(str(factors).encode()).hexdigest()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the late feedback, can we add one more assert to make sure str(factors) do not contain object ids? like searching for object at 0x. not sure how stable python object representations are.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that. @youkaichao Were you thinking an assert (hard error) or a warning?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be an assert (hard error). we should make sure object id does not occur, and when it occurs, we should be aware of it and let users report so that we can investigate.

yangw-dev pushed a commit to yangw-dev/vllm that referenced this pull request Apr 21, 2025
Signed-off-by: rzou <zou3519@gmail.com>
Signed-off-by: Yang Wang <elainewy@meta.com>
jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Apr 29, 2025
lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
Signed-off-by: rzou <zou3519@gmail.com>
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Error When Launching Llama-4-Scout-17B-16E-Instruct Without --kv-cache-dtype fp8

5 participants