Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adaption for moe models #2101

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

dhrhank187
Copy link

Dear huggingface peft community,

We have adapted the MoE model based on Megatron's RowParallelLinear and ColumnParallelLinear by modifying Loraparallellinear. Additionally, we have validated the Mixtral model. We would greatly appreciate your review and feedback to further improve and refine our work. Looking forward to your suggestions and comments!

Thank you for your support and collaboration!

@dhrhank187
Copy link
Author

@BenjaminBossan @pacman100

@BenjaminBossan
Copy link
Member

Could you please give more context, what are you referring to exactly and where is this new parameter being used?

Also, as is, this PR assumes that the base layer always has the is_expert attribute and that RowParallelLinear and ColumnParallelLinear always accept it as an argument. I don't think we can make these assumptions.

@dhrhank187
Copy link
Author

https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/tensor_parallel/layers.py

Thanks for your comment.

  1. The is_expert is a fixed parameter based on ColumnParallelLinear and RowParallelLinear.
  2. The base_layer is created based on ColumnParallelLinear and RowParallelLinear, so base_layer also has the is_expert parameter.
  3. When using the MoE model, this parameter is not enabled, it will lead to a mismatch between the shape of x and the shape of result in the forward function.

image
image
image

@BenjaminBossan
Copy link
Member

Ah I see, thanks for the pointers. So this was added to megatron more than a year ago, so I guess it should be fine, but I'm not sure if users may want to use other backends that don't have that parameter. Hopefully @zhangsheng377 can comment on this.

@zhangsheng377
Copy link
Contributor

The parameter is_expert should be newly added to megatron this year, right? I think that although we support custom backends, the default format should still be based on megatron, that is, the user's own backend should be compatible with the megatron interface. So I agree to add the is_expert parameter, but it would be better to elaborate on the lora parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants