Skip to content

Conversation

@zixi-qi
Copy link
Collaborator

@zixi-qi zixi-qi commented Sep 19, 2025

Purpose

Stack on top of #25221

Currently there are multiple duplicated speculative decode method names for MTP: "deepseek_mtp", "ernie_mtp", "qwen3_next_mtp", "mimo_mtp". This PR consolidates them into one "mtp" method so that we do not need to keep appending this list when creating new models with MTP

Test Plan

Ran unit tests and e2e tests

Test Result

  • unit tests
(vllm) ubuntu@209-20-159-113:~/vllm$ pytest tests/v1/e2e/test_spec_decode.py::test_mtp_correctness -v
==================================================================================================== test session starts =====================================================================================================
platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /home/ubuntu/vllm/.venv/bin/python
cachedir: .pytest_cache
rootdir: /home/ubuntu/vllm
configfile: pyproject.toml
plugins: anyio-4.10.0
collected 3 items                                                                                                                                                                                                            

tests/v1/e2e/test_spec_decode.py::test_mtp_correctness[FLASH_ATTN_VLLM_V1-mtp] PASSED                                                                                                                                  [ 33%]
tests/v1/e2e/test_spec_decode.py::test_mtp_correctness[TRITON_ATTN_VLLM_V1-mtp] SKIPPED (TRITON_ATTN_VLLM_V1 does not support multi-token MTP spec decode on current platform)                                         [ 66%]
tests/v1/e2e/test_spec_decode.py::test_mtp_correctness[TREE_ATTN-mtp] SKIPPED (MTP does not support tree-based speculative decoding)                                                                                   [100%]

====================================================================================================== warnings summary ======================================================================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

tests/v1/e2e/test_spec_decode.py::test_mtp_correctness[FLASH_ATTN_VLLM_V1-mtp]
tests/v1/e2e/test_spec_decode.py::test_mtp_correctness[FLASH_ATTN_VLLM_V1-mtp]
  /home/ubuntu/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=137287) is multi-threaded, use of fork() may lead to deadlocks in the child.
    self.pid = os.fork()

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
==================================================================================== 1 passed, 2 skipped, 4 warnings in 65.78s (0:01:05) =====================================================================================
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute
(vllm) ubuntu@209-20-159-113:~/vllm$ pytest tests/v1/spec_decode/test_mtp.py -v
==================================================================================================== test session starts =====================================================================================================
platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /home/ubuntu/vllm/.venv/bin/python
cachedir: .pytest_cache
rootdir: /home/ubuntu/vllm
configfile: pyproject.toml
plugins: anyio-4.10.0
collected 4 items                                                                                                                                                                                                            

tests/v1/spec_decode/test_mtp.py::test_mtp_load_model_unified PASSED                                                                                                                                                   [ 25%]
tests/v1/spec_decode/test_mtp.py::test_mtp_propose_returns_hidden_states[1-FLASH_ATTN_VLLM_V1] PASSED                                                                                                                  [ 50%]
tests/v1/spec_decode/test_mtp.py::test_mtp_propose_returns_hidden_states[1-TRITON_ATTN_VLLM_V1] SKIPPED (TRITON_ATTN_VLLM_V1 does not support multi-token spec decode on current platform)                             [ 75%]
tests/v1/spec_decode/test_mtp.py::test_mtp_propose_returns_hidden_states[1-TREE_ATTN] SKIPPED (MTP does not support tree-based speculative decoding)                                                                   [100%]

====================================================================================================== warnings summary ======================================================================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================================= 2 passed, 2 skipped, 2 warnings in 15.35s ==========================================================================================
  • e2e test
VLLM_USE_V1=1 python examples/offline_inference/spec_decode.py --num_spec_tokens 5 --num_prompts 80 --dataset-name hf --dataset-path philschmid/mt-bench --method mtp --model-dir XiaomiMiMo/MiMo-7B-Base --num-spec-tokens 2 --tp 1 --enforce-eager

--------------------------------------------------
total_num_output_tokens: 19758
num_drafts: 9626
num_draft_tokens: 19252
num_accepted_tokens: 10091
mean acceptance length: 2.05
--------------------------------------------------
acceptance at token 0: 0.87
acceptance at token 1: 0.18

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request consolidates various MTP speculative decoding method names into a unified "mtp" method. The implementation of this consolidation in the configuration and core logic appears correct. However, I've identified a few areas for improvement, primarily in the tests, which are still using deprecated method names, and a misleading warning message. My review includes suggestions to align the tests with the new consolidated method and to clarify the warning message for better user experience.

@zixi-qi zixi-qi force-pushed the consolidate-mtp-method-names branch from 2de5a65 to 7e9f9da Compare September 19, 2025 07:12
@mergify mergify bot added the documentation Improvements or additions to documentation label Sep 19, 2025
@zixi-qi zixi-qi force-pushed the consolidate-mtp-method-names branch from b649014 to b306c93 Compare September 24, 2025 00:47
Copy link
Collaborator

@luccafong luccafong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks for consolidating the codes.

@zixi-qi zixi-qi force-pushed the consolidate-mtp-method-names branch from 4090c55 to d9325f6 Compare September 25, 2025 17:49
Signed-off-by: zixi-qi <qizixi@meta.com>
Signed-off-by: zixi-qi <qizixi@meta.com>
Signed-off-by: zixi-qi <qizixi@meta.com>
Signed-off-by: zixi-qi <qizixi@meta.com>
Signed-off-by: zixi-qi <qizixi@meta.com>
Signed-off-by: zixi-qi <qizixi@meta.com>
@zixi-qi zixi-qi force-pushed the consolidate-mtp-method-names branch from d9325f6 to 94b8a63 Compare September 26, 2025 20:41
@luccafong luccafong added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 26, 2025
@luccafong luccafong enabled auto-merge (squash) September 26, 2025 20:43
@luccafong luccafong merged commit c70ac4b into vllm-project:main Sep 26, 2025
46 checks passed
simon-mo pushed a commit that referenced this pull request Oct 1, 2025
pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
)

Signed-off-by: zixi-qi <qizixi@meta.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…m-project#25232)

Signed-off-by: zixi-qi <qizixi@meta.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
shyeh25 pushed a commit to shyeh25/vllm that referenced this pull request Oct 14, 2025
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…m-project#25232)

Signed-off-by: zixi-qi <qizixi@meta.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants