-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
[Feature] Enhance EAGLE Architecture with Proper RMS Norms #14990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Enhance EAGLE Architecture with Proper RMS Norms #14990
Conversation
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
|
@WoosukKwon would appreciate your review as well! |
vllm/config.py
Outdated
| ('deepseek_v2', 'deepseek_v3', 'deepseek_mtp')) \ | ||
| and (self.hf_text_config.kv_lora_rank is not None) or \ | ||
| (hasattr(self.hf_text_config, "model_type") \ | ||
| and self.hf_text_config.model_type == 'eagle' \ | ||
| and self.hf_text_config.model.model_type in \ | ||
| ('deepseek_v2', 'deepseek_v3') \ | ||
| and self.hf_text_config.kv_lora_rank is not None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we actually clean this up?
- It'd be nice we can have some comments here since the code here is difficult to folllow.
- Maybe we can have some if statements like this?
if ...:
return True
elif ...:
return True
return FalseThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure! just committed a simplified version!
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit, otherwise LGTM
|
Thank you @DarkLight1337! |
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a pass. But I do have several concerns about this PR:
- This PR's eagle implementation is different from standard eagle.
- The users need to provide weights of RMS norms and also change the model config. I am concerned very limited users might use this.
Let me know your thoughts on this @luyuzhe111, thanks!
|
|
||
| self.add_para_norm = False | ||
| if hasattr(self.config.model, | ||
| "add_para_norm") and self.config.model.add_para_norm: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying add_para_norm will be added by the user in the model config file? And users need to provide the weights as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Lily @LiuXiaoxuanPKU , thanks for reviewing and raising these great questions!
- It indeed adds the option to load additional normalization layers, but it does not alter the default behavior. Thus, I think it should be fine?
- Yes, the users need to provide trained weights and corresponding model config. One immediate use case is actually DeepSeek. With these few of lines of change in this PR, one can actually load in MTP weights after some conversion. I thought this would be helpful since EAGLE will be added in V1 first and then MTP. Thus, with this PR, users can immediately begin using MTP even with only EAGLE support in V1. Finally, as mentioned in this PR, adding these norms actually improves EAGLE training (DeepSeek added these norms for good reasons). I do expect people to realize this soon, so it would be great if vLLM can anticipate and cater to the user needs in this fast-pacing field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the user needs to specify add_para_norm (as well as skip_output_norm, and skip_prenorm) if they want to change from the original EAGLE model architecture. One can reference the conversion script linked above. Again, I want to emphasize that this is backward compatible with the original EAGLE architecture & configs.
|
|
||
| from transformers import AutoConfig, PretrainedConfig | ||
|
|
||
| from vllm.transformers_utils.configs.deepseek_vl2 import DeepseekV2Config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the code style's perspective, it's confusing to import deepseek config in the eagle file...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is very awkward indeed... but did not find a better solution since AutoConfig does not support DeepSeek config. Open to any suggestions.
| target_archs = ["DeepseekV2ForCausalLM", "DeepseekV3ForCausalLM"] | ||
| if any(target_arch in archs for target_arch in target_archs): | ||
| # AutoConfig does not support DeepSeek MoE models yet | ||
| model_config = DeepseekV2Config(**model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you were referring to why we need to single out DeepSeek config, it's due to the fact that AutoConfig does not support DeepSeek config. Were you referring to something else?
Thanks again for your time to review this PR!
…ect#14990) Signed-off-by: Bryan Lu <yuzhelu@amazon.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
…ect#14990) Signed-off-by: Bryan Lu <yuzhelu@amazon.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
…ect#14990) Signed-off-by: Bryan Lu <yuzhelu@amazon.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
…ect#14990) Signed-off-by: Bryan Lu <yuzhelu@amazon.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
In this PR, we allow the EAGLE architecture to optionally use proper RMS normalizations similar to the DeepSeek MTP module. It has two main benefits.
With these normalizations, the performance degradation due to the approximated KV cache is now 0 ~ 9%, compared to the 8% ~ 15% drop without these normalizations. However, I want to emphasize that I do not intend to use this PR as a fix of #14649, but rather to show the community how to alleviate this bug temporarily while the bug is being fixed.
cc @LiuXiaoxuanPKU Would appreciate your review. Thanks!