-
-
Couldn't load subscription status.
- Fork 10.9k
[Bugfix] fix pp for llama4 #16746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] fix pp for llama4 #16746
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Lu Fang <fanglu@fb.com>
fcd5805 to
fdc4236
Compare
| self.language_model = _initialize_model( | ||
| vllm_config=vllm_config.with_hf_config(config.text_config), | ||
| vllm_config=vllm_config.with_hf_config(config.text_config, | ||
| ["LlamaForCausalLM"]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use Llama4ForCausalLM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Llama4ForCasualLm is not registered architecture, we should avoid using that which requires adding a lot of hacks as in the initial PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a dumb question, but since _initialize_model is already pointing to model_class=Llama4ForCausalLM, why do we need to override the architectures here to LlamaForCausalLM?
vllm/vllm/model_executor/model_loader/loader.py
Lines 114 to 123 in 8cac35b
| def _initialize_model( | |
| vllm_config: VllmConfig, | |
| *, | |
| prefix: str = "", | |
| model_class: Optional[type[nn.Module]] = None, | |
| ) -> nn.Module: | |
| """Initialize a model with the given configurations.""" | |
| model_config = vllm_config.model_config | |
| if model_class is None: | |
| model_class, _ = get_model_architecture(model_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
during __post__init__ of the hf config when we call replace inside with_hf_config function
Line 3781 in a018e55
| return replace(self, model_config=model_config) |
Line 3790 in a018e55
| self.model_config.verify_with_parallel_config(self.parallel_config) |
vllm/vllm/model_executor/models/registry.py
Lines 441 to 442 in a018e55
| normalized_arch = list( | |
| filter(lambda model: model in self.models, architectures)) |
| # using llama4's load_weights routine. | ||
| language_model_weights, other_weights = self.separate_weights( | ||
| weights, prefix="language_model.model.") | ||
| weights, prefix="language_model.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering why this issue was not triggered before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
language_model.lm_head can also be loaded by the parent model weight loader if not PP enabled, but for PP, as we split the weights into 2 parts, the lm_head is missing in PP=0 so it will raise an issue weight not found. while in llama4.py model loading we have logic handling is_pp_missing_parameter to avoid the exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix.
Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: Yang Wang <elainewy@meta.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
This PR fixes PP for llama4 (#16385)
Loading language model with architecture to support PP verification and correct the prefix to separate model loading weights for language model and rest of the models so both PP=0 and PP=1 can work.