- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10.9k
[Bugfix] fix pp for llama4 #16746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] fix pp for llama4 #16746
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
|  | @@ -672,9 +672,9 @@ def __init__(self, *, vllm_config: VllmConfig, prefix: str = ""): | |
| self.config, | ||
| None, | ||
| prefix=maybe_prefix(prefix, "multi_modal_projector")) | ||
|  | ||
| self.language_model = _initialize_model( | ||
| vllm_config=vllm_config.with_hf_config(config.text_config), | ||
| vllm_config=vllm_config.with_hf_config(config.text_config, | ||
| ["LlamaForCausalLM"]), | ||
| prefix=maybe_prefix(prefix, "language_model"), | ||
| model_class=Llama4ForCausalLM, | ||
| ) | ||
|  | @@ -824,7 +824,7 @@ def load_weights(self, weights: Iterable[Tuple[str, | |
| # language_model is an Llama4ForCausalLM instance. We load it's | ||
| # using llama4's load_weights routine. | ||
| language_model_weights, other_weights = self.separate_weights( | ||
| weights, prefix="language_model.model.") | ||
| weights, prefix="language_model.") | ||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wondering why this issue was not triggered before? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. language_model.lm_head can also be loaded by the parent model weight loader if not PP enabled, but for PP, as we split the weights into 2 parts, the lm_head is missing in PP=0 so it will raise an issue weight not found. while in llama4.py model loading we have logic handling is_pp_missing_parameter to avoid the exception | ||
| loader = AutoWeightsLoader(self) | ||
| loaded_language_model_params = loader.load_weights( | ||
| language_model_weights) | ||
|  | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use Llama4ForCausalLM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Llama4ForCasualLm is not registered architecture, we should avoid using that which requires adding a lot of hacks as in the initial PR
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a dumb question, but since
_initialize_modelis already pointing tomodel_class=Llama4ForCausalLM, why do we need to override thearchitectureshere toLlamaForCausalLM?vllm/vllm/model_executor/model_loader/loader.py
Lines 114 to 123 in 8cac35b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
during
__post__init__of the hf config when we callreplaceinsidewith_hf_configfunctionvllm/vllm/config.py
Line 3781 in a018e55
vllm/vllm/config.py
Line 3790 in a018e55
vllm/vllm/model_executor/models/registry.py
Lines 441 to 442 in a018e55