Skip to content

Conversation

@3outeille
Copy link
Member

  • MoE-like HF models to work with 2D parallelism: FSDP, CP (and the combinations between them) + WITHOUT build_optimizers_with_moe_load_balancing. The following models were tested:
    • deepseek-ai/DeepSeek-V3
    • moonshotai/Moonlight-16B-A3B
    • moonshotai/Kimi-K2-Instruct
    • zai-org/GLM-4.5

@3outeille 3outeille changed the title begining moe load balancing Add transformer backend (MoE) clean Nov 3, 2025
@3outeille 3outeille force-pushed the 3outeille/transformers_backend branch from a241ae7 to a70c4c4 Compare November 4, 2025 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants