[Model] Add LongCat-Flash #23991

OftenDream · 2025-08-30T14:40:57Z

This PR implements support for the newly released LongCat-Flash model by Meituan.
The core implementation includes:

Model architecture in longcat_flash.py
MTP model architecture in longcat_flash_mtp.py

github-actions · 2025-08-30T14:41:05Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request introduces support for the LongCat-Flash model, including its architecture and a multi-token prediction (MTP) variant for speculative decoding. The changes are extensive, touching model implementation, configuration, and fused MoE kernels. The core logic for the new model seems well-integrated, reusing components from existing models like DeepseekV2 where appropriate. My main feedback concerns a function in the Fused MoE implementation that modifies its inputs in-place, which could be a source of bugs. Overall, the PR is a significant contribution, adding a complex new model to vLLM.

gemini-code-assist · 2025-08-30T14:43:04Z

vllm/model_executor/layers/fused_moe/fused_moe.py

+    expert_indices[normal_expert_mask] = 0
+    expert_scales[normal_expert_mask] = 0.0


The function zero_experts_compute_triton modifies its input tensors expert_indices and expert_scales in-place. This side effect can be unexpected and lead to bugs if the caller reuses these tensors assuming they are unchanged. While this might be an intentional optimization to avoid extra memory allocations, it makes the code harder to reason about and maintain. To improve clarity and safety, consider returning the modified tensors instead of modifying them in-place. This would make the data flow explicit.

Xu-Wenqing · 2025-09-02T09:51:14Z

Create a PR to support tool call for LongCat-Flash-Chat model: #24083

OftenDream · 2025-09-02T13:09:28Z

@youkaichao @ywang96 @simon-mo
Hi team 👋,

Just a friendly ping on this PR when you have a moment.
I'd greatly appreciate any feedback or suggestions to move this forward.
Please let me know if there's anything I can clarify or improve! 🙏

ghost · 2025-09-03T03:22:09Z

Can we proceed with merging this feature?

simon-mo · 2025-09-24T16:59:00Z

Please fix the test failure

2025-09-24T14:25:50Z] FAILED models/test_initialization.py::test_can_initialize_large_subset[LongCatFlashMTPModel] - AttributeError: property 'num_hidden_layers' of 'LongcatFlashConfig' object has no setter
[2025-09-24T14:25:50Z]

https://buildkite.com/vllm/ci/builds/32283/steps/canvas

Signed-off-by: yangxurui <yangxurui@meituan.com>

OftenDream · 2025-09-25T04:50:36Z

Please fix the test failure

2025-09-24T14:25:50Z] FAILED models/test_initialization.py::test_can_initialize_large_subset[LongCatFlashMTPModel] - AttributeError: property 'num_hidden_layers' of 'LongcatFlashConfig' object has no setter [2025-09-24T14:25:50Z]

https://buildkite.com/vllm/ci/builds/32283/steps/canvas

done

simon-mo · 2025-09-25T04:53:45Z

Great work!

PR vllm-project#23991 use another attribute from triton.language, which cause import error in TPU setup. Enhance the placeholder for TPU environment. Signed-off-by: Weida Hong <wdhongtw@google.com>

vllm-project/vllm#23991 vllm-project/vllm#25613 --------- Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>

vllm-project/vllm#23991 vllm-project/vllm#25613 --------- Signed-off-by: Chendi Xue <Chendi.Xue@intel.com> Signed-off-by: Iryna Boiko <iboiko@habana.ai>

Signed-off-by: yangxurui <yangxurui@meituan.com> Co-authored-by: yangxurui <yangxurui@meituan.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

Signed-off-by: yangxurui <yangxurui@meituan.com> Co-authored-by: yangxurui <yangxurui@meituan.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: yangxurui <yangxurui@meituan.com> Co-authored-by: yangxurui <yangxurui@meituan.com>

Signed-off-by: yangxurui <yangxurui@meituan.com> Co-authored-by: yangxurui <yangxurui@meituan.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

OftenDream requested review from DarkLight1337, ProExpertProg, WoosukKwon, alexm-redhat, comaniac, hmellor, houseroad, mgoin, njhill, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256, youkaichao, ywang96 and zhuohan123 as code owners August 30, 2025 14:40

mergify bot added documentation Improvements or additions to documentation deepseek Related to DeepSeek models new-model Requests to new models speculative-decoding v1 labels Aug 30, 2025

gemini-code-assist bot reviewed Aug 30, 2025

View reviewed changes

OftenDream force-pushed the LongCatFlash branch 3 times, most recently from 9bd1622 to 5af8af3 Compare September 1, 2025 01:05

Xu-Wenqing mentioned this pull request Sep 2, 2025

Support LongCat-Flash-Chat tool call #24083

Merged

5 tasks

mergify bot removed the needs-rebase label Sep 24, 2025

OftenDream force-pushed the LongCatFlash branch 7 times, most recently from 657153c to b95456d Compare September 24, 2025 11:27

simon-mo enabled auto-merge (squash) September 24, 2025 13:52

auto-merge was automatically disabled September 25, 2025 02:15
Head branch was pushed to by a user without write access

fix

870a2b8

Signed-off-by: yangxurui <yangxurui@meituan.com>

OftenDream force-pushed the LongCatFlash branch from 30fac96 to 870a2b8 Compare September 25, 2025 02:16

simon-mo merged commit 845adb3 into vllm-project:main Sep 25, 2025
78 checks passed

adobrzyn mentioned this pull request Sep 25, 2025

[Bugfix] Add triton.language.tensor placeholder #25649

Merged

wdhongtw mentioned this pull request Sep 25, 2025

[Bugfix] Correct import error on TPU accelerator #25662

Closed

5 tasks

xuechendi mentioned this pull request Sep 25, 2025

fix crash introduced by upstream PR 25613 and PR23991 vllm-project/vllm-gaudi#259

Merged

SageMoore mentioned this pull request Sep 25, 2025

[Bugfix] Fix Shared Expert/Zero expert code in FusedMoE.process_chunk #25698

Merged

xuechendi added a commit to vllm-project/vllm-gaudi that referenced this pull request Sep 25, 2025

fix crash introduced by upstream PR 25613 and PR23991 (#259)

1d5b260

vllm-project/vllm#23991 vllm-project/vllm#25613 --------- Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>

CSWYF3634076 mentioned this pull request Sep 28, 2025

[Bug]: ERNIE-4.5-21B-A3B-Thinking accuracy issue due to PR 23991 #25833

Closed

1 task

pavanimajety mentioned this pull request Sep 29, 2025

[Bugfix] Fix accuracy issue of TRTLLM FP8 MOE and improve logging #25895

Merged

4 tasks

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

[Model] Add LongCat-Flash (#23991)

c26e7b1

Signed-off-by: yangxurui <yangxurui@meituan.com> Co-authored-by: yangxurui <yangxurui@meituan.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025

[Model] Add LongCat-Flash (vllm-project#23991)

096e958

Signed-off-by: yangxurui <yangxurui@meituan.com> Co-authored-by: yangxurui <yangxurui@meituan.com>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Model] Add LongCat-Flash (vllm-project#23991)

bd4f746

Signed-off-by: yangxurui <yangxurui@meituan.com> Co-authored-by: yangxurui <yangxurui@meituan.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Model] Add LongCat-Flash #23991

[Model] Add LongCat-Flash #23991

Uh oh!

OftenDream commented Aug 30, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 30, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 30, 2025

Uh oh!

Xu-Wenqing commented Sep 2, 2025

Uh oh!

OftenDream commented Sep 2, 2025

Uh oh!

ghost commented Sep 3, 2025

Uh oh!

simon-mo commented Sep 24, 2025

Uh oh!

OftenDream commented Sep 25, 2025

Uh oh!

Uh oh!

simon-mo commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

		expert_indices[normal_expert_mask] = 0
		expert_scales[normal_expert_mask] = 0.0

Uh oh!

[Model] Add LongCat-Flash #23991

[Model] Add LongCat-Flash #23991

Uh oh!

Conversation

OftenDream commented Aug 30, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 30, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

Xu-Wenqing commented Sep 2, 2025

Uh oh!

OftenDream commented Sep 2, 2025

Uh oh!

ghost commented Sep 3, 2025

Uh oh!

simon-mo commented Sep 24, 2025

Uh oh!

OftenDream commented Sep 25, 2025

Uh oh!

Uh oh!

simon-mo commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

OftenDream commented Aug 30, 2025 •

edited by github-actions bot

Loading