[spec decode] Consolidate speculative decode method name for MTP #25232

zixi-qi · 2025-09-19T06:38:41Z

Purpose

Stack on top of #25221

Currently there are multiple duplicated speculative decode method names for MTP: "deepseek_mtp", "ernie_mtp", "qwen3_next_mtp", "mimo_mtp". This PR consolidates them into one "mtp" method so that we do not need to keep appending this list when creating new models with MTP

Test Plan

Ran unit tests and e2e tests

Test Result

unit tests

(vllm) ubuntu@209-20-159-113:~/vllm$ pytest tests/v1/e2e/test_spec_decode.py::test_mtp_correctness -v
==================================================================================================== test session starts =====================================================================================================
platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /home/ubuntu/vllm/.venv/bin/python
cachedir: .pytest_cache
rootdir: /home/ubuntu/vllm
configfile: pyproject.toml
plugins: anyio-4.10.0
collected 3 items                                                                                                                                                                                                            

tests/v1/e2e/test_spec_decode.py::test_mtp_correctness[FLASH_ATTN_VLLM_V1-mtp] PASSED                                                                                                                                  [ 33%]
tests/v1/e2e/test_spec_decode.py::test_mtp_correctness[TRITON_ATTN_VLLM_V1-mtp] SKIPPED (TRITON_ATTN_VLLM_V1 does not support multi-token MTP spec decode on current platform)                                         [ 66%]
tests/v1/e2e/test_spec_decode.py::test_mtp_correctness[TREE_ATTN-mtp] SKIPPED (MTP does not support tree-based speculative decoding)                                                                                   [100%]

====================================================================================================== warnings summary ======================================================================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

tests/v1/e2e/test_spec_decode.py::test_mtp_correctness[FLASH_ATTN_VLLM_V1-mtp]
tests/v1/e2e/test_spec_decode.py::test_mtp_correctness[FLASH_ATTN_VLLM_V1-mtp]
  /home/ubuntu/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=137287) is multi-threaded, use of fork() may lead to deadlocks in the child.
    self.pid = os.fork()

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
==================================================================================== 1 passed, 2 skipped, 4 warnings in 65.78s (0:01:05) =====================================================================================
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute
(vllm) ubuntu@209-20-159-113:~/vllm$ pytest tests/v1/spec_decode/test_mtp.py -v
==================================================================================================== test session starts =====================================================================================================
platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /home/ubuntu/vllm/.venv/bin/python
cachedir: .pytest_cache
rootdir: /home/ubuntu/vllm
configfile: pyproject.toml
plugins: anyio-4.10.0
collected 4 items                                                                                                                                                                                                            

tests/v1/spec_decode/test_mtp.py::test_mtp_load_model_unified PASSED                                                                                                                                                   [ 25%]
tests/v1/spec_decode/test_mtp.py::test_mtp_propose_returns_hidden_states[1-FLASH_ATTN_VLLM_V1] PASSED                                                                                                                  [ 50%]
tests/v1/spec_decode/test_mtp.py::test_mtp_propose_returns_hidden_states[1-TRITON_ATTN_VLLM_V1] SKIPPED (TRITON_ATTN_VLLM_V1 does not support multi-token spec decode on current platform)                             [ 75%]
tests/v1/spec_decode/test_mtp.py::test_mtp_propose_returns_hidden_states[1-TREE_ATTN] SKIPPED (MTP does not support tree-based speculative decoding)                                                                   [100%]

====================================================================================================== warnings summary ======================================================================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================================= 2 passed, 2 skipped, 2 warnings in 15.35s ==========================================================================================

e2e test

VLLM_USE_V1=1 python examples/offline_inference/spec_decode.py --num_spec_tokens 5 --num_prompts 80 --dataset-name hf --dataset-path philschmid/mt-bench --method mtp --model-dir XiaomiMiMo/MiMo-7B-Base --num-spec-tokens 2 --tp 1 --enforce-eager

--------------------------------------------------
total_num_output_tokens: 19758
num_drafts: 9626
num_draft_tokens: 19252
num_accepted_tokens: 10091
mean acceptance length: 2.05
--------------------------------------------------
acceptance at token 0: 0.87
acceptance at token 1: 0.18

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request consolidates various MTP speculative decoding method names into a unified "mtp" method. The implementation of this consolidation in the configuration and core logic appears correct. However, I've identified a few areas for improvement, primarily in the tests, which are still using deprecated method names, and a misleading warning message. My review includes suggestions to align the tests with the new consolidated method and to clarify the warning message for better user experience.

tests/v1/e2e/test_spec_decode.py

tests/v1/spec_decode/test_mtp.py

vllm/config/speculative.py

tests/v1/spec_decode/test_mtp.py

tests/v1/e2e/test_spec_decode.py

vllm/config/speculative.py

luccafong

LGTM! thanks for consolidating the codes.

Signed-off-by: zixi-qi <qizixi@meta.com>

) Signed-off-by: zixi-qi <qizixi@meta.com>

…m-project#25232) Signed-off-by: zixi-qi <qizixi@meta.com>

) Signed-off-by: zixi-qi <qizixi@meta.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

…m-project#25232) Signed-off-by: zixi-qi <qizixi@meta.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

…m-project#25232) Signed-off-by: zixi-qi <qizixi@meta.com>

Signed-off-by: zixi-qi <qizixi@meta.com>

…m-project#25232) Signed-off-by: zixi-qi <qizixi@meta.com>

…m-project#25232) Signed-off-by: zixi-qi <qizixi@meta.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

zixi-qi requested review from ProExpertProg, WoosukKwon, benchislett, hmellor, houseroad, luccafong, mgoin, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256 and youkaichao as code owners September 19, 2025 06:38

zixi-qi requested a review from zhuohan123 September 19, 2025 06:39

mergify bot added speculative-decoding v1 labels Sep 19, 2025

gemini-code-assist bot reviewed Sep 19, 2025

View reviewed changes

tests/v1/e2e/test_spec_decode.py Outdated Show resolved Hide resolved

tests/v1/spec_decode/test_mtp.py Outdated Show resolved Hide resolved

zixi-qi force-pushed the consolidate-mtp-method-names branch from 2de5a65 to 7e9f9da Compare September 19, 2025 07:12

mergify bot added the documentation Improvements or additions to documentation label Sep 19, 2025

benchislett reviewed Sep 19, 2025

View reviewed changes

vllm/config/speculative.py Outdated Show resolved Hide resolved

benchislett reviewed Sep 19, 2025

View reviewed changes

tests/v1/spec_decode/test_mtp.py Outdated Show resolved Hide resolved

zixi-qi force-pushed the consolidate-mtp-method-names branch from 601f399 to b649014 Compare September 20, 2025 04:29

zixi-qi mentioned this pull request Sep 20, 2025

[spec decode] Add unit tests and e2e test for MTP inference #25221

Closed

5 tasks

luccafong reviewed Sep 22, 2025

View reviewed changes

tests/v1/e2e/test_spec_decode.py Outdated Show resolved Hide resolved

vllm/config/speculative.py Outdated Show resolved Hide resolved

vllm/config/speculative.py Outdated Show resolved Hide resolved

vllm/config/speculative.py Outdated Show resolved Hide resolved

zixi-qi force-pushed the consolidate-mtp-method-names branch from b649014 to b306c93 Compare September 24, 2025 00:47

luccafong approved these changes Sep 25, 2025

View reviewed changes

zixi-qi force-pushed the consolidate-mtp-method-names branch from 4090c55 to d9325f6 Compare September 25, 2025 17:49

zixi-qi added 4 commits September 26, 2025 11:59

Add unit tests and e2e test for MTP inference

11f6636

Signed-off-by: zixi-qi <qizixi@meta.com>

rename to deepseek_mtp

fb95d63

Signed-off-by: zixi-qi <qizixi@meta.com>

address comments

b92a28b

Signed-off-by: zixi-qi <qizixi@meta.com>

address comments

f0cb51f

Signed-off-by: zixi-qi <qizixi@meta.com>

zixi-qi added 2 commits September 26, 2025 12:01

fix unit tests and add deepseek test case

0a920c3

Signed-off-by: zixi-qi <qizixi@meta.com>

rebase on main

94b8a63

Signed-off-by: zixi-qi <qizixi@meta.com>

zixi-qi force-pushed the consolidate-mtp-method-names branch from d9325f6 to 94b8a63 Compare September 26, 2025 20:41

luccafong added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 26, 2025

luccafong enabled auto-merge (squash) September 26, 2025 20:43

luccafong merged commit c70ac4b into vllm-project:main Sep 26, 2025
46 checks passed

simon-mo pushed a commit that referenced this pull request Oct 1, 2025

[spec decode] Consolidate speculative decode method name for MTP (#25232

c214d69

) Signed-off-by: zixi-qi <qizixi@meta.com>

pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025

[spec decode] Consolidate speculative decode method name for MTP (vll…

55b306b

…m-project#25232) Signed-off-by: zixi-qi <qizixi@meta.com>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

[spec decode] Consolidate speculative decode method name for MTP (#25232

1356ae0

) Signed-off-by: zixi-qi <qizixi@meta.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[spec decode] Consolidate speculative decode method name for MTP (vll…

ded05c1

…m-project#25232) Signed-off-by: zixi-qi <qizixi@meta.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025

[spec decode] Consolidate speculative decode method name for MTP (vll…

1fa4e37

…m-project#25232) Signed-off-by: zixi-qi <qizixi@meta.com>

shyeh25 pushed a commit to shyeh25/vllm that referenced this pull request Oct 14, 2025

Consolidate speculative decode method name for MTP (vllm-project#25232)

05ae204

Signed-off-by: zixi-qi <qizixi@meta.com>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[spec decode] Consolidate speculative decode method name for MTP (vll…

c230e26

…m-project#25232) Signed-off-by: zixi-qi <qizixi@meta.com>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[spec decode] Consolidate speculative decode method name for MTP (vll…

06dce4b

…m-project#25232) Signed-off-by: zixi-qi <qizixi@meta.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[spec decode] Consolidate speculative decode method name for MTP (vll…

6b2038f

…m-project#25232) Signed-off-by: zixi-qi <qizixi@meta.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[spec decode] Consolidate speculative decode method name for MTP #25232

[spec decode] Consolidate speculative decode method name for MTP #25232

Uh oh!

zixi-qi commented Sep 19, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

luccafong left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[spec decode] Consolidate speculative decode method name for MTP #25232

[spec decode] Consolidate speculative decode method name for MTP #25232

Uh oh!

Conversation

zixi-qi commented Sep 19, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

luccafong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zixi-qi commented Sep 19, 2025 •

edited by github-actions bot

Loading