Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use Megablocks in MoE training #236

Open
CSCYQJ opened this issue Jun 5, 2024 · 1 comment
Open

How to use Megablocks in MoE training #236

CSCYQJ opened this issue Jun 5, 2024 · 1 comment

Comments

@CSCYQJ
Copy link

CSCYQJ commented Jun 5, 2024

I noticed that "Tutel v0.3: Add Megablocks solution to improve decoder inference on single-GPU with num_local_expert >= 2", but when I use megablocks in MoE training (dropless-MoE), the following error occurred:
image
And I found the reason may be that torch.ops.tutel_ops.sparse_bmm_infer doesn't support backward operation.

@ghostplant
Copy link
Contributor

ghostplant commented Jun 6, 2024

Megablocks is disabled in training mode as the optimization isn't useful for models having single expert per GPU, especially for huge-scale training. So in training mode, please set megablocks_size=0 if self.training

Megablocks's two assumptions: (1) has to be > 1 local expert per GPU; (2) has to be imbalanced for local experts. Unless you want to train an imbalanced model on purpose by disabling balanced loss, Megablocks won't be helpful to training performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants