minor refactor over EP #1854

tianyu-l · 2025-10-11T00:14:00Z

This PR:

let ExpertParallel handles indices permute / unpermute when EP is used
move to_local to model code to be more explicit
rename the expert_parallel wrapper which does permute / unpermute to indices_permutation_wrapper to be more accurate

wwwjn

Nice refactor!

wwwjn · 2025-10-12T01:31:16Z

torchtitan/distributed/expert_parallel.py

-    1 and 2 are needed only when expert_parallel_degree > 1.
-    3 is needed even for single-device computation.
-    2 can be moved to ExpertParallel _token_dispatch if not coupled with 3.
+    In order to use torch._grouped_mm, we need to make sure the number of


nit: This description only talks about padding, didn't talk about generate_permute_indices kernel to permute the inputs to be ordered by local experts

This wrapper now is only responsible for padding, when EP is not used. I renamed to make it more clear.

wwwjn · 2025-10-12T01:55:59Z

torchtitan/distributed/expert_parallel.py

        )


 class ExpertParallel(ParallelStyle):


I have question about when we apply _permute() and _unpermute(), now they are applied in 2 places

In ExpertParallel(), which is applied on transformer_block.moe.experts, so the input of MoE module will be reordered by local experts.

When use_grouped_mm is enabled, in indices_permutation_wrapper, it will also try to permute the inputs of GroupedExperts by the order of local experts

Why do we need to apply is twice?

They won't be applied together.

When EP is used, EP will do _permute and _unpermute.
When EP is not used, indices_padding_wrapper will do them.

This PR: - let `ExpertParallel` handles indices permute / unpermute when EP is used - move `to_local` to model code to be more explicit - rename the `expert_parallel` wrapper which does permute / unpermute to `indices_permutation_wrapper` to be more accurate

tianyu-l requested review from fegin, wconstab and wwwjn as code owners October 11, 2025 00:14

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 11, 2025

wwwjn approved these changes Oct 12, 2025

View reviewed changes

wwwjn reviewed Oct 12, 2025

View reviewed changes

more adjustment

5245b90

tianyu-l force-pushed the ep branch from 9921478 to 5245b90 Compare October 12, 2025 05:07

tianyu-l linked an issue Oct 12, 2025 that may be closed by this pull request

ReordererSequenceParallel (ETP=1) doesn't work with torch.compile + AC #1806

Closed

tianyu-l mentioned this pull request Oct 12, 2025

ReordererSequenceParallel (ETP=1) doesn't work with torch.compile + AC #1806

Closed

tianyu-l merged commit 9603872 into main Oct 12, 2025
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

minor refactor over EP #1854

minor refactor over EP #1854

tianyu-l commented Oct 11, 2025

Uh oh!

wwwjn left a comment

Uh oh!

wwwjn Oct 12, 2025

Uh oh!

tianyu-l Oct 12, 2025

Uh oh!

wwwjn Oct 12, 2025

Uh oh!

tianyu-l Oct 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

minor refactor over EP #1854

minor refactor over EP #1854

Conversation

tianyu-l commented Oct 11, 2025

Uh oh!

wwwjn left a comment

Choose a reason for hiding this comment

Uh oh!

wwwjn Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

tianyu-l Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

wwwjn Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

tianyu-l Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants