adaption for moe models #2101

dhrhank187 · 2024-09-26T09:43:54Z

Dear huggingface peft community,

We have adapted the MoE model based on Megatron's RowParallelLinear and ColumnParallelLinear by modifying Loraparallellinear. Additionally, we have validated the Mixtral model. We would greatly appreciate your review and feedback to further improve and refine our work. Looking forward to your suggestions and comments!

Thank you for your support and collaboration!

dhrhank187 · 2024-09-26T09:48:00Z

@BenjaminBossan @pacman100

BenjaminBossan · 2024-09-26T09:50:17Z

Could you please give more context, what are you referring to exactly and where is this new parameter being used?

Also, as is, this PR assumes that the base layer always has the is_expert attribute and that RowParallelLinear and ColumnParallelLinear always accept it as an argument. I don't think we can make these assumptions.

dhrhank187 · 2024-09-26T12:26:40Z

https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/tensor_parallel/layers.py

Thanks for your comment.

The is_expert is a fixed parameter based on ColumnParallelLinear and RowParallelLinear.
The base_layer is created based on ColumnParallelLinear and RowParallelLinear, so base_layer also has the is_expert parameter.
When using the MoE model, this parameter is not enabled, it will lead to a mismatch between the shape of x and the shape of result in the forward function.

BenjaminBossan · 2024-09-26T12:37:05Z

Ah I see, thanks for the pointers. So this was added to megatron more than a year ago, so I guess it should be fine, but I'm not sure if users may want to use other backends that don't have that parameter. Hopefully @zhangsheng377 can comment on this.

zhangsheng377 · 2024-09-26T12:48:41Z

The parameter is_expert should be newly added to megatron this year, right? I think that although we support custom backends, the default format should still be based on megatron, that is, the user's own backend should be compatible with the megatron interface. So I agree to add the is_expert parameter, but it would be better to elaborate on the lora parameter.

adaption for moe models

b75c001

zhangsheng377 approved these changes Sep 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adaption for moe models #2101

adaption for moe models #2101

dhrhank187 commented Sep 26, 2024

dhrhank187 commented Sep 26, 2024

BenjaminBossan commented Sep 26, 2024

dhrhank187 commented Sep 26, 2024

BenjaminBossan commented Sep 26, 2024

zhangsheng377 commented Sep 26, 2024

adaption for moe models #2101

Are you sure you want to change the base?

adaption for moe models #2101

Conversation

dhrhank187 commented Sep 26, 2024

dhrhank187 commented Sep 26, 2024

BenjaminBossan commented Sep 26, 2024

dhrhank187 commented Sep 26, 2024

BenjaminBossan commented Sep 26, 2024

zhangsheng377 commented Sep 26, 2024