minor refactor over EP (pytorch#1854)

tianyu-l · githubsgi · commit c48a4308f87b · 2025-10-28T18:12:19.000-07:00
This PR:

- let `ExpertParallel` handles indices permute / unpermute when EP is
used
- move `to_local` to model code to be more explicit
- rename the `expert_parallel` wrapper which does permute / unpermute to
`indices_permutation_wrapper` to be more accurate
diff --git a/torchtitan/distributed/expert_parallel.py b/torchtitan/distributed/expert_parallel.py
@@ -227,7 +227,6 @@ def __init__(self):
     def _prepare_inputput_fn(self, mod, inputs, device_mesh):
         # shape (batch_size*seq_len, top_k)
         top_scores, selected_experts_indices = inputs
-        num_tokens, _ = top_scores.shape
 
         # NOTE: If needed, we can pad tokens in case bs*slen is not divisible by TP degree
         # if top_scores.shape[0] % device_mesh.size() != 0: