Skip to content

Conversation

@OftenDream
Copy link
Contributor

@OftenDream OftenDream commented Aug 30, 2025

This PR implements support for the newly released LongCat-Flash model by Meituan.
The core implementation includes:

  • Model architecture in longcat_flash.py
  • MTP model architecture in longcat_flash_mtp.py

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@mergify mergify bot added documentation Improvements or additions to documentation deepseek Related to DeepSeek models new-model Requests to new models speculative-decoding v1 labels Aug 30, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the LongCat-Flash model, including its architecture and a multi-token prediction (MTP) variant for speculative decoding. The changes are extensive, touching model implementation, configuration, and fused MoE kernels. The core logic for the new model seems well-integrated, reusing components from existing models like DeepseekV2 where appropriate. My main feedback concerns a function in the Fused MoE implementation that modifies its inputs in-place, which could be a source of bugs. Overall, the PR is a significant contribution, adding a complex new model to vLLM.

Comment on lines +715 to +714
expert_indices[normal_expert_mask] = 0
expert_scales[normal_expert_mask] = 0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The function zero_experts_compute_triton modifies its input tensors expert_indices and expert_scales in-place. This side effect can be unexpected and lead to bugs if the caller reuses these tensors assuming they are unchanged. While this might be an intentional optimization to avoid extra memory allocations, it makes the code harder to reason about and maintain. To improve clarity and safety, consider returning the modified tensors instead of modifying them in-place. This would make the data flow explicit.

@OftenDream OftenDream force-pushed the LongCatFlash branch 3 times, most recently from 9bd1622 to 5af8af3 Compare September 1, 2025 01:05
@Xu-Wenqing
Copy link
Contributor

Create a PR to support tool call for LongCat-Flash-Chat model: #24083

@OftenDream
Copy link
Contributor Author

@youkaichao @ywang96 @simon-mo
Hi team 👋,

Just a friendly ping on this PR when you have a moment.
I'd greatly appreciate any feedback or suggestions to move this forward.
Please let me know if there's anything I can clarify or improve! 🙏

@ghost
Copy link

ghost commented Sep 3, 2025

Can we proceed with merging this feature?

@mergify mergify bot removed the needs-rebase label Sep 24, 2025
@OftenDream OftenDream force-pushed the LongCatFlash branch 7 times, most recently from 657153c to b95456d Compare September 24, 2025 11:27
@simon-mo simon-mo enabled auto-merge (squash) September 24, 2025 13:52
@simon-mo
Copy link
Collaborator

Please fix the test failure

2025-09-24T14:25:50Z] FAILED models/test_initialization.py::test_can_initialize_large_subset[LongCatFlashMTPModel] - AttributeError: property 'num_hidden_layers' of 'LongcatFlashConfig' object has no setter
[2025-09-24T14:25:50Z]

https://buildkite.com/vllm/ci/builds/32283/steps/canvas

auto-merge was automatically disabled September 25, 2025 02:15

Head branch was pushed to by a user without write access

fix
Signed-off-by: yangxurui <yangxurui@meituan.com>
@OftenDream
Copy link
Contributor Author

Please fix the test failure

2025-09-24T14:25:50Z] FAILED models/test_initialization.py::test_can_initialize_large_subset[LongCatFlashMTPModel] - AttributeError: property 'num_hidden_layers' of 'LongcatFlashConfig' object has no setter [2025-09-24T14:25:50Z]

https://buildkite.com/vllm/ci/builds/32283/steps/canvas

done

@simon-mo simon-mo merged commit 845adb3 into vllm-project:main Sep 25, 2025
78 checks passed
@simon-mo
Copy link
Collaborator

Great work!

wdhongtw added a commit to wdhongtw/vllm that referenced this pull request Sep 25, 2025
PR vllm-project#23991 use another attribute from triton.language, which cause
import error in TPU setup.

Enhance the placeholder for TPU environment.

Signed-off-by: Weida Hong <wdhongtw@google.com>
xuechendi added a commit to vllm-project/vllm-gaudi that referenced this pull request Sep 25, 2025
iboiko-habana pushed a commit to iboiko-habana/vllm-gaudi that referenced this pull request Oct 2, 2025
vllm-project/vllm#23991
vllm-project/vllm#25613

---------

Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
Signed-off-by: Iryna Boiko <iboiko@habana.ai>
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Signed-off-by: yangxurui <yangxurui@meituan.com>
Co-authored-by: yangxurui <yangxurui@meituan.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: yangxurui <yangxurui@meituan.com>
Co-authored-by: yangxurui <yangxurui@meituan.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
Signed-off-by: yangxurui <yangxurui@meituan.com>
Co-authored-by: yangxurui <yangxurui@meituan.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
Signed-off-by: yangxurui <yangxurui@meituan.com>
Co-authored-by: yangxurui <yangxurui@meituan.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: yangxurui <yangxurui@meituan.com>
Co-authored-by: yangxurui <yangxurui@meituan.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models documentation Improvements or additions to documentation new-model Requests to new models ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants