Skip to content

[WIP][FSDP] Support FSDP for Qwen3Next#1116

Merged
zhuzilin merged 7 commits intoTHUDM:mainfrom
rucnyz:qwen3next
Dec 18, 2025
Merged

[WIP][FSDP] Support FSDP for Qwen3Next#1116
zhuzilin merged 7 commits intoTHUDM:mainfrom
rucnyz:qwen3next

Conversation

@rucnyz
Copy link
Contributor

@rucnyz rucnyz commented Dec 15, 2025

This pr supports FSDP training for Qwen/Qwen3-Next-80B-A3B model.

Closes #1061

@rucnyz rucnyz marked this pull request as ready for review December 18, 2025 11:26
@rucnyz
Copy link
Contributor Author

rucnyz commented Dec 18, 2025

For fixing fla issue, please see pr fla-org/flash-linear-attention#687

@rucnyz rucnyz changed the title [WIP][FSDP][SGLANG] Support FSDP for Qwen3Next [WIP][FSDP] Support FSDP for Qwen3Next Dec 18, 2025
@zhuzilin zhuzilin merged commit b23fcd1 into THUDM:main Dec 18, 2025
@PopSoda2002
Copy link
Collaborator

Great work!

Fengzdadi pushed a commit to Fengzdadi/slime that referenced this pull request Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FSDP] Model supporting, Qwen3-next

3 participants