-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is llavallama moe supported? #9
Comments
Great choice. Work in progress! |
Can I use the latest code to test just by modifying version with 'v1' with
a trained llava model?
…On Thu, Feb 1, 2024 at 1:21 PM lb203 ***@***.***> wrote:
Great choice. Work in progress!
—
Reply to this email directly, view it on GitHub
<#9 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMDRFRF53G2L66KCXG7FANLYRMQ6PAVCNFSM6AAAAABCUGO6RWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRQGUZTMNRQGA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I think this should work. |
Any ways to insert MOE layers only in part of LLM layers? I found that modifying all layers in 13b cannot fit into 40G A100 |
For example, if you want to inset MoE layers in the first and third layer, you can pass |
I used pretrained llava to initialize moe-llava, and passed in moe_layers_idx
params, and encountered the following error.
AssertionError: The model has moe layers, but None of the param groups are
marked as MoE. Create a param group with 'moe' key set to True before
creating optimizer
Any additional modifications to solve this?
…On Sat, Feb 3, 2024 at 10:32 AM lb203 ***@***.***> wrote:
Closed #9 <#9> as
completed.
—
Reply to this email directly, view it on GitHub
<#9 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMDRFRH2F3NAY6RCSAG5WALYRWOU7AVCNFSM6AAAAABCUGO6RWVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJRGY4TCMRWGM4DENI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Here is the solution. |
I found it really weird that even I set minimal num_of experts & moe layers, moe-llama still cannot fit into 40G A100, here is the trainable modules I modified according to llama. --train_modules mlp.gate_proj mlp.up_proj mlp.down_proj wg \ |
You can enable the flash_attn2, and try it again. Refer to this issue. Btw, how many GPUs you use? |
modified |
Could you post your command?
|
`moe_mode="sparse" HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 deepspeed moellava/train/train_mem.py The rest remains consistent with llava |
We will check it later. Could you try other model, such as phi or stablelm? |
ok, but the point for me is to test the result for an extra moe stage for a
trained model, so I am currently working on my trained llava.
…On Tue, Feb 6, 2024 at 9:47 PM lb203 ***@***.***> wrote:
`moe_mode="sparse" num_experts=1 top_k_experts=1 use_residual=False
router_aux_loss_coef=0.01 JSON_FOLDER="ft_json"
IMAGE_FOLDER="train_image_video"
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 deepspeed
moellava/train/train_mem.py --moe_enable False --num_experts ${num_experts}
--top_k_experts ${top_k_experts} --capacity_factor 1.5 --moe_layers_idx 0 5
10 --moe_mode ${moe_mode} --use_residual ${use_residual}
--router_aux_loss_coef ${router_aux_loss_coef} --train_modules
mlp.gate_proj mlp.up_proj mlp.down_proj wg --deepspeed ./scripts/zero2.json
--model_name_or_path $(pretrained llava weight) --version v1
--per_device_train_batch_size 1 --per_device_eval_batch_size 16
--gradient_accumulation_steps 16 `
The rest remains consistent with llava
We will check it later. Could you try other model, such as phi or stablelm?
—
Reply to this email directly, view it on GitHub
<#9 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMDRFRCUHB5XPHIXJX3A75DYSIX6HAVCNFSM6AAAAABCUGO6RWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRZGY3DKNRYGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hi, have you tested the result for llava_llama version? Would an extra moe stage improve original llava results?
The text was updated successfully, but these errors were encountered: