Skip to content

Conversation

@zixi-qi
Copy link
Collaborator

@zixi-qi zixi-qi commented Sep 19, 2025

Purpose

Add test coverage for MTP inference

Test Plan

Ran the newly added tests

Test Result

pytest tests/v1/e2e/test_spec_decode.py::test_mtp_correctness -v

tests/v1/e2e/test_spec_decode.py::test_mtp_correctness[FLASH_ATTN_VLLM_V1-mimo_mtp] PASSED                                                                                                                                         [ 33%]
tests/v1/e2e/test_spec_decode.py::test_mtp_correctness[TRITON_ATTN_VLLM_V1-mimo_mtp] SKIPPED (TRITON_ATTN_VLLM_V1 does not support multi-token MTP spec decode on current platform)                                                [ 66%]
tests/v1/e2e/test_spec_decode.py::test_mtp_correctness[TREE_ATTN-mimo_mtp] SKIPPED (MTP does not support tree-based speculative decoding)                                                                                          [100%]
========================================================================================== 1 passed, 2 skipped, 4 warnings in 67.84s (0:01:07) ===========================================================================================
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute
pytest tests/v1/spec_decode/test_mtp.py -v

tests/v1/spec_decode/test_mtp.py::test_mtp_load_model_unified PASSED                                                                                                                                                               [ 14%]
tests/v1/spec_decode/test_mtp.py::test_mtp_propose_returns_hidden_states[1-FLASH_ATTN_VLLM_V1] PASSED                                                                                                                              [ 28%]
tests/v1/spec_decode/test_mtp.py::test_mtp_propose_returns_hidden_states[1-TRITON_ATTN_VLLM_V1] SKIPPED (TRITON_ATTN_VLLM_V1 does not support multi-token spec decode on current platform)                                         [ 42%]
tests/v1/spec_decode/test_mtp.py::test_mtp_propose_returns_hidden_states[1-TREE_ATTN] SKIPPED (MTP does not support tree-based speculative decoding)                                                                               [ 57%]
tests/v1/spec_decode/test_mtp.py::test_mtp_propose_returns_hidden_states[4-FLASH_ATTN_VLLM_V1] PASSED                                                                                                                              [ 71%]
tests/v1/spec_decode/test_mtp.py::test_mtp_propose_returns_hidden_states[4-TRITON_ATTN_VLLM_V1] SKIPPED (TRITON_ATTN_VLLM_V1 does not support multi-token spec decode on current platform)                                         [ 85%]
tests/v1/spec_decode/test_mtp.py::test_mtp_propose_returns_hidden_states[4-TREE_ATTN] SKIPPED (MTP does not support tree-based speculative decoding)                                                                               [100%]
============================================================================================================ warnings summary ============================================================================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================================================================================== 3 passed, 4 skipped, 2 warnings in 19.53s ================================================================================================


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds unit and end-to-end tests for MTP (Multi-Token Prediction) speculative decoding, specifically for the MiMo model. The new tests are well-structured and follow existing patterns. However, I've identified a critical contradiction between the test configurations and a comment in the model's implementation regarding the number of supported speculative tokens. This needs to be resolved to ensure the tests are valid and correctly verify the intended functionality.

Signed-off-by: zixi-qi <qizixi@meta.com>
Signed-off-by: zixi-qi <qizixi@meta.com>
@zixi-qi
Copy link
Collaborator Author

zixi-qi commented Sep 20, 2025

Changes from this PR has been moved to #25232

@zixi-qi zixi-qi closed this Sep 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant