[Feature] Enhance EAGLE Architecture with Proper RMS Norms #14990

luyuzhe111 · 2025-03-18T01:00:15Z

In this PR, we allow the EAGLE architecture to optionally use proper RMS normalizations similar to the DeepSeek MTP module. It has two main benefits.

It improves acceptance length by ~5% when pre-attention norm, the output norm, and parallel norms (highlighted in red in the diagram below, taken from the DeepSeek V3 technical report) are used during training. Through ablation studies, we found all these RMS norms contribute to the improvement.

It alleviates the approximated KV cache bug currently in the vLLM EAGLE implementation as it reduces the caused performance degradation from 15% to 9% [Bug]: EAGLE / MTP Doesn't Overwrite Approximated Hidden States / KV Cache, 8%- 15% Acceptance Length Degradation #14649 . Essentially, these additional RMS norms make sure the approximated KV cache is not too off, since the hidden states, from which those KV cache are computed, are normalized. The following table show the acceptance length for Llama 3 8B on GSM8K using this example script.

Number of Speculated Tokens	1	2	3	4	5
EAGLE Repo	1.84	2.45	2.83	3.07	3.17
vLLM	1.84	2.37	2.65	2.84	2.89
Acceptance Length Drop	0%	3%	6%	7%	9%

With these normalizations, the performance degradation due to the approximated KV cache is now 0 ~ 9%, compared to the 8% ~ 15% drop without these normalizations. However, I want to emphasize that I do not intend to use this PR as a fix of #14649, but rather to show the community how to alleviate this bug temporarily while the bug is being fixed.

cc @LiuXiaoxuanPKU Would appreciate your review. Thanks!

Signed-off-by: Bryan Lu <yuzhelu@amazon.com>

github-actions · 2025-03-18T01:00:23Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Bryan Lu <yuzhelu@amazon.com>

luyuzhe111 · 2025-03-25T21:43:49Z

@WoosukKwon would appreciate your review as well!

WoosukKwon · 2025-03-26T00:16:13Z

vllm/config.py

+                    ('deepseek_v2', 'deepseek_v3', 'deepseek_mtp')) \
+                and (self.hf_text_config.kv_lora_rank is not None) or \
+                (hasattr(self.hf_text_config, "model_type") \
+                and self.hf_text_config.model_type == 'eagle' \
+                and self.hf_text_config.model.model_type in \
+                    ('deepseek_v2', 'deepseek_v3') \
+                and self.hf_text_config.kv_lora_rank is not None)


nit: Can we actually clean this up?

It'd be nice we can have some comments here since the code here is difficult to folllow.

Maybe we can have some if statements like this?

if ...: return True elif ...: return True return False

sure! just committed a simplified version!

Signed-off-by: Bryan Lu <yuzhelu@amazon.com>

DarkLight1337

Just a nit, otherwise LGTM

vllm/transformers_utils/configs/eagle.py

WoosukKwon · 2025-03-26T03:49:53Z

Thank you @DarkLight1337!

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

LiuXiaoxuanPKU

Took a pass. But I do have several concerns about this PR:

This PR's eagle implementation is different from standard eagle.
The users need to provide weights of RMS norms and also change the model config. I am concerned very limited users might use this.
Let me know your thoughts on this @luyuzhe111, thanks!

LiuXiaoxuanPKU · 2025-03-26T06:48:00Z

vllm/model_executor/models/eagle.py

+
+        self.add_para_norm = False
+        if hasattr(self.config.model,
+                   "add_para_norm") and self.config.model.add_para_norm:


Are you saying add_para_norm will be added by the user in the model config file? And users need to provide the weights as well?

Hi Lily @LiuXiaoxuanPKU , thanks for reviewing and raising these great questions!

It indeed adds the option to load additional normalization layers, but it does not alter the default behavior. Thus, I think it should be fine?

Yes, the users need to provide trained weights and corresponding model config. One immediate use case is actually DeepSeek. With these few of lines of change in this PR, one can actually load in MTP weights after some conversion. I thought this would be helpful since EAGLE will be added in V1 first and then MTP. Thus, with this PR, users can immediately begin using MTP even with only EAGLE support in V1. Finally, as mentioned in this PR, adding these norms actually improves EAGLE training (DeepSeek added these norms for good reasons). I do expect people to realize this soon, so it would be great if vLLM can anticipate and cater to the user needs in this fast-pacing field.

Yes, the user needs to specify add_para_norm (as well as skip_output_norm, and skip_prenorm) if they want to change from the original EAGLE model architecture. One can reference the conversion script linked above. Again, I want to emphasize that this is backward compatible with the original EAGLE architecture & configs.

LiuXiaoxuanPKU · 2025-03-26T06:49:42Z

vllm/transformers_utils/configs/eagle.py


 from transformers import AutoConfig, PretrainedConfig

+from vllm.transformers_utils.configs.deepseek_vl2 import DeepseekV2Config


From the code style's perspective, it's confusing to import deepseek config in the eagle file...

it is very awkward indeed... but did not find a better solution since AutoConfig does not support DeepSeek config. Open to any suggestions.

LiuXiaoxuanPKU · 2025-03-26T06:51:08Z

vllm/transformers_utils/configs/eagle.py

+            target_archs = ["DeepseekV2ForCausalLM", "DeepseekV3ForCausalLM"]
+            if any(target_arch in archs for target_arch in target_archs):
+                # AutoConfig does not support DeepSeek MoE models yet
+                model_config = DeepseekV2Config(**model)


Why do we need the change here?

If you were referring to why we need to single out DeepSeek config, it's due to the fact that AutoConfig does not support DeepSeek config. Were you referring to something else?

Thanks again for your time to review this PR!

…ect#14990) Signed-off-by: Bryan Lu <yuzhelu@amazon.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

…ect#14990) Signed-off-by: Bryan Lu <yuzhelu@amazon.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

…ect#14990) Signed-off-by: Bryan Lu <yuzhelu@amazon.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

enhance EAGLE architecture

a218b97

Signed-off-by: Bryan Lu <yuzhelu@amazon.com>

luyuzhe111 changed the title ~~[Feature] Enhance EAGLE architecture~~ [Feature] Enhance EAGLE Architecture with Proper RMS Norms Mar 18, 2025

luyuzhe111 mentioned this pull request Mar 22, 2025

[Feature] Eagle Chunked Prefill Support #14922

Closed

luyuzhe111 added 2 commits March 22, 2025 06:46

make sure EAGLE properly loads in DeepSeek MoE layer

d045e0b

Signed-off-by: Bryan Lu <yuzhelu@amazon.com>

fix type confusion

b86475b

Signed-off-by: Bryan Lu <yuzhelu@amazon.com>

WoosukKwon reviewed Mar 26, 2025

View reviewed changes

reformat logic to decide if deepseek mla is used

39f9e12

Signed-off-by: Bryan Lu <yuzhelu@amazon.com>

DarkLight1337 approved these changes Mar 26, 2025

View reviewed changes

vllm/transformers_utils/configs/eagle.py Outdated Show resolved Hide resolved

cleaner formatting for config loading.

29ff7a5

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

DarkLight1337 enabled auto-merge (squash) March 26, 2025 05:20

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 26, 2025

LiuXiaoxuanPKU reviewed Mar 26, 2025

View reviewed changes

DarkLight1337 merged commit 781d056 into vllm-project:main Mar 26, 2025
43 of 44 checks passed

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

[Feature] Enhance EAGLE Architecture with Proper RMS Norms (vllm-proj…

e582cb0

…ect#14990) Signed-off-by: Bryan Lu <yuzhelu@amazon.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[Feature] Enhance EAGLE Architecture with Proper RMS Norms (vllm-proj…

889bce5

…ect#14990) Signed-off-by: Bryan Lu <yuzhelu@amazon.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>


		from transformers import AutoConfig, PretrainedConfig

		from vllm.transformers_utils.configs.deepseek_vl2 import DeepseekV2Config

Uh oh!

[Feature] Enhance EAGLE Architecture with Proper RMS Norms #14990

[Feature] Enhance EAGLE Architecture with Proper RMS Norms #14990

Uh oh!

Conversation

luyuzhe111 commented Mar 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 18, 2025

Uh oh!

luyuzhe111 commented Mar 25, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

WoosukKwon commented Mar 26, 2025

Uh oh!

LiuXiaoxuanPKU left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

luyuzhe111 commented Mar 18, 2025 •

edited by github-actions bot

Loading