- 
                Notifications
    You must be signed in to change notification settings 
- Fork 530
[V1][eagle3] Support eagle3 proposer for v1 #1032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Thansk for contributions, you can  | 
| Thanks for your contributions. 
 | 
| This pull request has conflicts, please resolve those before we can evaluate the pull request. | 
edf2424    to
    1c6ed70      
    Compare
  
    | This pull request has conflicts, please resolve those before we can evaluate the pull request. | 
df84c06    to
    9a4371e      
    Compare
  
    | @@ -0,0 +1,429 @@ | |||
| # SPDX-License-Identifier: Apache-2.0 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please revise file header.
I think this file should be placed in the woker directory instead of creating a new spec_decode directory. @wangxiyuan
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure this UT could run in v1 engine when CI running.
The UT in the current long_term directory is run by the v0 engine by default.
You can refer to test_v1_spec_decode.py or test_v1_mtp_correctness.py.
You may also have to modify .github/workflows/vllm_ascend_test_long_term.yaml
| This pull request has conflicts, please resolve those before we can evaluate the pull request. | 
da8ae03    to
    33e2db5      
    Compare
  
    | @jianzs @ganyi1996ppo @Yikun please help review. | 
| This pull request has conflicts, please resolve those before we can evaluate the pull request. | 
d250d7d    to
    b4b74c9      
    Compare
  
    Signed-off-by: yuancaoyaoHW <a2749322671@gmail.com>
b4b74c9    to
    d0aa341      
    Compare
  
    | I'm OK with this PR, I trend to merge this today. | 
Signed-off-by: yuancaoyaoHW <a2749322671@gmail.com>
ccd73f2    to
    be0dcb2      
    Compare
  
    I would like to nominate Mengqing Cao (@MengqingCao https://github.com/MengqingCao) as a maintainer, starting with my +1. ## Reason Review Quality: She has completed [120+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+commenter%3Amengqingcao+-author%3Amengqingcao) since Feb. 2025, include [#review-3077842852](#2088 (review)), [comment-2990074116](#1032 (comment)), [comment-2921063723](#1013 (comment)) high quality review. Sustained and Quality Contributions: She has Deep understanding of vLLM and vLLM Ascend codebases and solid contributions include The vLLM contributions and help vLLM Ascend release is the main reason I nominated her: - vLLM: Things worth mentioning that she completed [28+ PR contributions](https://github.com/vllm-project/vllm/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged+) in vllm-project/vllm, especially for vLLM platform module to improve vLLM mult hardware support. She is one of the important co-authors of [vllm#8054](vllm-project/vllm#8054) and hardware plugin RFC, this makes vllm-ascend plugin possible. Community Involvement: She is also very active and involved in [60+ issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20-author%3AMengqingCao%20commenter%3AMengqingCao). So I think she's a great addition to the vLLM Ascend Maintainer team. - ✅**Review Quality:** She has completed 120+ reviews since Feb. 2025. https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+commenter%3Amengqingcao+-author%3Amengqingcao, include #2088 (review), #1446 (comment), #1032 (comment), #1013 (comment) quality review. - ✅**Sustained Contributions:** 99+ PR merged in vllm-project/vllm-ascend https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged - ✅**Quality Contribution:** She is one of the important co-authors of vllm-project/vllm#8054 , this makes vllm-ascend plugin possible. Things worth mentioning that she complete 28+ PR contributions in vllm-project/vllm, especially for vLLM platform module to improve vLLM mult hardware support: https://github.com/vllm-project/vllm/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged+. At 2025 Q2, She also lead the [[RFC]: E2E CI test for key features](#413) and [[RFC]: Unit test coverage improvement](#1298) to help vllm ascend improve the coverage. Her main contributions focus on the adaptation of parallel strategies and communicator, such as #1800, #1856. These contributions are sufficient to prove she has “Deep understanding of vLLM and vLLM Ascend codebases” - ✅**Community Involvement:** Involved in 63+ issue reviewer https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20-author%3AMengqingCao%20commenter%3AMengqingCao She led the v0.10.1 release as release manager - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@78dba40 Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
| I noticed that this PR was only submitted to version 0.9.1rc1 and is missing in versions 0.9.1rc2 and 0.9.1rc3. Is there an issue with this? Will this content be included in the subsequent official release of version 0.9.1? @wangxiyuan | 
I would like to nominate Mengqing Cao (@MengqingCao https://github.com/MengqingCao) as a maintainer, starting with my +1. ## Reason Review Quality: She has completed [120+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+commenter%3Amengqingcao+-author%3Amengqingcao) since Feb. 2025, include [#review-3077842852](vllm-project#2088 (review)), [comment-2990074116](vllm-project#1032 (comment)), [comment-2921063723](vllm-project#1013 (comment)) high quality review. Sustained and Quality Contributions: She has Deep understanding of vLLM and vLLM Ascend codebases and solid contributions include The vLLM contributions and help vLLM Ascend release is the main reason I nominated her: - vLLM: Things worth mentioning that she completed [28+ PR contributions](https://github.com/vllm-project/vllm/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged+) in vllm-project/vllm, especially for vLLM platform module to improve vLLM mult hardware support. She is one of the important co-authors of [vllm#8054](vllm-project/vllm#8054) and hardware plugin RFC, this makes vllm-ascend plugin possible. Community Involvement: She is also very active and involved in [60+ issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20-author%3AMengqingCao%20commenter%3AMengqingCao). So I think she's a great addition to the vLLM Ascend Maintainer team. - ✅**Review Quality:** She has completed 120+ reviews since Feb. 2025. https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+commenter%3Amengqingcao+-author%3Amengqingcao, include vllm-project#2088 (review), vllm-project#1446 (comment), vllm-project#1032 (comment), vllm-project#1013 (comment) quality review. - ✅**Sustained Contributions:** 99+ PR merged in vllm-project/vllm-ascend https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged - ✅**Quality Contribution:** She is one of the important co-authors of vllm-project/vllm#8054 , this makes vllm-ascend plugin possible. Things worth mentioning that she complete 28+ PR contributions in vllm-project/vllm, especially for vLLM platform module to improve vLLM mult hardware support: https://github.com/vllm-project/vllm/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged+. At 2025 Q2, She also lead the [[RFC]: E2E CI test for key features](vllm-project#413) and [[RFC]: Unit test coverage improvement](vllm-project#1298) to help vllm ascend improve the coverage. Her main contributions focus on the adaptation of parallel strategies and communicator, such as vllm-project#1800, vllm-project#1856. These contributions are sufficient to prove she has “Deep understanding of vLLM and vLLM Ascend codebases” - ✅**Community Involvement:** Involved in 63+ issue reviewer https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20-author%3AMengqingCao%20commenter%3AMengqingCao She led the v0.10.1 release as release manager - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@78dba40 Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
I would like to nominate Mengqing Cao (@MengqingCao https://github.com/MengqingCao) as a maintainer, starting with my +1. ## Reason Review Quality: She has completed [120+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+commenter%3Amengqingcao+-author%3Amengqingcao) since Feb. 2025, include [#review-3077842852](vllm-project#2088 (review)), [comment-2990074116](vllm-project#1032 (comment)), [comment-2921063723](vllm-project#1013 (comment)) high quality review. Sustained and Quality Contributions: She has Deep understanding of vLLM and vLLM Ascend codebases and solid contributions include The vLLM contributions and help vLLM Ascend release is the main reason I nominated her: - vLLM: Things worth mentioning that she completed [28+ PR contributions](https://github.com/vllm-project/vllm/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged+) in vllm-project/vllm, especially for vLLM platform module to improve vLLM mult hardware support. She is one of the important co-authors of [vllm#8054](vllm-project/vllm#8054) and hardware plugin RFC, this makes vllm-ascend plugin possible. Community Involvement: She is also very active and involved in [60+ issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20-author%3AMengqingCao%20commenter%3AMengqingCao). So I think she's a great addition to the vLLM Ascend Maintainer team. - ✅**Review Quality:** She has completed 120+ reviews since Feb. 2025. https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+commenter%3Amengqingcao+-author%3Amengqingcao, include vllm-project#2088 (review), vllm-project#1446 (comment), vllm-project#1032 (comment), vllm-project#1013 (comment) quality review. - ✅**Sustained Contributions:** 99+ PR merged in vllm-project/vllm-ascend https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged - ✅**Quality Contribution:** She is one of the important co-authors of vllm-project/vllm#8054 , this makes vllm-ascend plugin possible. Things worth mentioning that she complete 28+ PR contributions in vllm-project/vllm, especially for vLLM platform module to improve vLLM mult hardware support: https://github.com/vllm-project/vllm/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged+. At 2025 Q2, She also lead the [[RFC]: E2E CI test for key features](vllm-project#413) and [[RFC]: Unit test coverage improvement](vllm-project#1298) to help vllm ascend improve the coverage. Her main contributions focus on the adaptation of parallel strategies and communicator, such as vllm-project#1800, vllm-project#1856. These contributions are sufficient to prove she has “Deep understanding of vLLM and vLLM Ascend codebases” - ✅**Community Involvement:** Involved in 63+ issue reviewer https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20-author%3AMengqingCao%20commenter%3AMengqingCao She led the v0.10.1 release as release manager - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@78dba40 Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
### What this PR does / why we need it? This PR implements the Eagle Pososer feature for vLLM v1, which enables more efficient speculative decoding by using a draft model to predict potential future tokens. - The implementation includes the core Eagle algorithm integration with vLLM's existing architecture, allowing for faster inference while maintaining output quality. - This is needed to significantly improve the generation speed of large language models without compromising on the quality of generated text. ### Does this PR introduce any user-facing change? Yes, this PR introduces a new speculative decoding mode that can be enabled via configuration. - Users can now choose to use Eagle Pososer by setting appropriate flags in the inference configuration. - The API remains backward compatible, with the new functionality being opt-in. ### How was this patch tested? CI passed with new unit tests added for the Eagle Pososer functionality. - Benchmark tests were conducted comparing generation speed and quality with and without Eagle Pososer. - Integration tests were performed with various model architectures to ensure compatibility. - Manual testing was done using different prompt scenarios to verify output quality remains consistent. - we test accept rate on one Ascend 910B npu, The acceptance rate results are basically consistent with those shown here: vllm-project/vllm#16937 - Currently, we support scenarios where num_spec_tokens <= 2. When num_spec_tokens > 2, issues such as insufficient GPU memory and operator computation errors may occur. We will address this in subsequent updates. - We will add support for Eagle v1 in future updates. ### Acceptance Test Script ```bash SCRIPT="/offline/eagle.py" DATASET="ShareGpt" MODEL=Meta-Llama-3.1-8B-Instruct DRAFT=EAGLE3-LLaMA3.1-Instruct-8B CUDA_VISIBLE_DEVICES="0" VLLM_USE_V1=1 $PYTHON $SCRIPT \ --dataset $DATASET \ --num_spec_tokens 2 \ --max_num_seqs 1 \ --model_dir $MODEL \ --eagle_dir $DRAFT \ --tp 1 \ --num_prompts 80 ``` ### Acceptance Test Results ```bash ██████████████████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [21:22<00:00, 16.03s/it, est. speed input: 4.72 toks/s, output: 13.56 toks/s] ------------------------------------------------------------------------------------- mean acceptance length: 1.63 ------------------------------------------------------------------------------------- total_counts: 8062 acceptance at token 0: 1.00 (8062 times) acceptance at token 1: 0.70 (5612 times) acceptance at token 2: 0.47 (3765 times) ``` Closes: vllm-project#1004 --------- Signed-off-by: yuancaoyaoHW <a2749322671@gmail.com>
### What this PR does / why we need it? This PR implements the Eagle Pososer feature for vLLM v1, which enables more efficient speculative decoding by using a draft model to predict potential future tokens. - The implementation includes the core Eagle algorithm integration with vLLM's existing architecture, allowing for faster inference while maintaining output quality. - This is needed to significantly improve the generation speed of large language models without compromising on the quality of generated text. ### Does this PR introduce any user-facing change? Yes, this PR introduces a new speculative decoding mode that can be enabled via configuration. - Users can now choose to use Eagle Pososer by setting appropriate flags in the inference configuration. - The API remains backward compatible, with the new functionality being opt-in. ### How was this patch tested? CI passed with new unit tests added for the Eagle Pososer functionality. - Benchmark tests were conducted comparing generation speed and quality with and without Eagle Pososer. - Integration tests were performed with various model architectures to ensure compatibility. - Manual testing was done using different prompt scenarios to verify output quality remains consistent. - we test accept rate on one Ascend 910B npu, The acceptance rate results are basically consistent with those shown here: vllm-project/vllm#16937 - Currently, we support scenarios where num_spec_tokens <= 2. When num_spec_tokens > 2, issues such as insufficient GPU memory and operator computation errors may occur. We will address this in subsequent updates. - We will add support for Eagle v1 in future updates. ### Acceptance Test Script ```bash SCRIPT="/offline/eagle.py" DATASET="ShareGpt" MODEL=Meta-Llama-3.1-8B-Instruct DRAFT=EAGLE3-LLaMA3.1-Instruct-8B CUDA_VISIBLE_DEVICES="0" VLLM_USE_V1=1 $PYTHON $SCRIPT \ --dataset $DATASET \ --num_spec_tokens 2 \ --max_num_seqs 1 \ --model_dir $MODEL \ --eagle_dir $DRAFT \ --tp 1 \ --num_prompts 80 ``` ### Acceptance Test Results ```bash ██████████████████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [21:22<00:00, 16.03s/it, est. speed input: 4.72 toks/s, output: 13.56 toks/s] ------------------------------------------------------------------------------------- mean acceptance length: 1.63 ------------------------------------------------------------------------------------- total_counts: 8062 acceptance at token 0: 1.00 (8062 times) acceptance at token 1: 0.70 (5612 times) acceptance at token 2: 0.47 (3765 times) ``` Closes: vllm-project#1004 --------- Signed-off-by: yuancaoyaoHW <a2749322671@gmail.com>
I would like to nominate Mengqing Cao (@MengqingCao https://github.com/MengqingCao) as a maintainer, starting with my +1. ## Reason Review Quality: She has completed [120+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+commenter%3Amengqingcao+-author%3Amengqingcao) since Feb. 2025, include [#review-3077842852](vllm-project#2088 (review)), [comment-2990074116](vllm-project#1032 (comment)), [comment-2921063723](vllm-project#1013 (comment)) high quality review. Sustained and Quality Contributions: She has Deep understanding of vLLM and vLLM Ascend codebases and solid contributions include The vLLM contributions and help vLLM Ascend release is the main reason I nominated her: - vLLM: Things worth mentioning that she completed [28+ PR contributions](https://github.com/vllm-project/vllm/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged+) in vllm-project/vllm, especially for vLLM platform module to improve vLLM mult hardware support. She is one of the important co-authors of [vllm#8054](vllm-project/vllm#8054) and hardware plugin RFC, this makes vllm-ascend plugin possible. Community Involvement: She is also very active and involved in [60+ issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20-author%3AMengqingCao%20commenter%3AMengqingCao). So I think she's a great addition to the vLLM Ascend Maintainer team. - ✅**Review Quality:** She has completed 120+ reviews since Feb. 2025. https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+commenter%3Amengqingcao+-author%3Amengqingcao, include vllm-project#2088 (review), vllm-project#1446 (comment), vllm-project#1032 (comment), vllm-project#1013 (comment) quality review. - ✅**Sustained Contributions:** 99+ PR merged in vllm-project/vllm-ascend https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged - ✅**Quality Contribution:** She is one of the important co-authors of vllm-project/vllm#8054 , this makes vllm-ascend plugin possible. Things worth mentioning that she complete 28+ PR contributions in vllm-project/vllm, especially for vLLM platform module to improve vLLM mult hardware support: https://github.com/vllm-project/vllm/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged+. At 2025 Q2, She also lead the [[RFC]: E2E CI test for key features](vllm-project#413) and [[RFC]: Unit test coverage improvement](vllm-project#1298) to help vllm ascend improve the coverage. Her main contributions focus on the adaptation of parallel strategies and communicator, such as vllm-project#1800, vllm-project#1856. These contributions are sufficient to prove she has “Deep understanding of vLLM and vLLM Ascend codebases” - ✅**Community Involvement:** Involved in 63+ issue reviewer https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20-author%3AMengqingCao%20commenter%3AMengqingCao She led the v0.10.1 release as release manager - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@78dba40 Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
What this PR does / why we need it?
This PR implements the Eagle Pososer feature for vLLM v1, which enables more efficient speculative decoding by using a draft model to predict potential future tokens.
Does this PR introduce any user-facing change?
Yes, this PR introduces a new speculative decoding mode that can be enabled via configuration.
How was this patch tested?
CI passed with new unit tests added for the Eagle Pososer functionality.
Acceptance Test Script
Acceptance Test Results
Closes: #1004