You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that "Tutel v0.3: Add Megablocks solution to improve decoder inference on single-GPU with num_local_expert >= 2", but when I use megablocks in MoE training (dropless-MoE), the following error occurred:
And I found the reason may be that torch.ops.tutel_ops.sparse_bmm_infer doesn't support backward operation.
The text was updated successfully, but these errors were encountered:
Megablocks is disabled in training mode as the optimization isn't useful for models having single expert per GPU, especially for huge-scale training. So in training mode, please set megablocks_size=0 if self.training
Megablocks's two assumptions: (1) has to be > 1 local expert per GPU; (2) has to be imbalanced for local experts. Unless you want to train an imbalanced model on purpose by disablingbalanced loss, Megablocks won't be helpful to training performance.
I noticed that "Tutel v0.3: Add Megablocks solution to improve decoder inference on single-GPU with num_local_expert >= 2", but when I use megablocks in MoE training (dropless-MoE), the following error occurred:
And I found the reason may be that
torch.ops.tutel_ops.sparse_bmm_infer
doesn't support backward operation.The text was updated successfully, but these errors were encountered: