Is llavallama moe supported? #9

DietDietDiet · 2024-02-01T04:26:10Z

Hi, have you tested the result for llava_llama version? Would an extra moe stage improve original llava results?

LinB203 · 2024-02-01T05:21:30Z

Great choice. Work in progress!

DietDietDiet · 2024-02-01T05:48:48Z

Can I use the latest code to test just by modifying version with 'v1' with a trained llava model?

On Thu, Feb 1, 2024 at 1:21 PM lb203 ***@***.***> wrote: Great choice. Work in progress! — Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMDRFRF53G2L66KCXG7FANLYRMQ6PAVCNFSM6AAAAABCUGO6RWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRQGUZTMNRQGA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

LinB203 · 2024-02-01T05:50:32Z

I think this should work.

DietDietDiet · 2024-02-02T07:11:53Z

Any ways to insert MOE layers only in part of LLM layers? I found that modifying all layers in 13b cannot fit into 40G A100

LinB203 · 2024-02-02T07:16:08Z

Any ways to insert MOE layers only in part of LLM layers? I found that modifying all layers in 13b cannot fit into 40G A100

For example, if you want to inset MoE layers in the first and third layer, you can pass --moe_layers_idx 0 2 in your command.

DietDietDiet · 2024-02-04T02:12:06Z

I used pretrained llava to initialize moe-llava, and passed in moe_layers_idx params, and encountered the following error. AssertionError: The model has moe layers, but None of the param groups are marked as MoE. Create a param group with 'moe' key set to True before creating optimizer Any additional modifications to solve this?

…

On Sat, Feb 3, 2024 at 10:32 AM lb203 ***@***.***> wrote: Closed #9 <#9> as completed. — Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMDRFRH2F3NAY6RCSAG5WALYRWOU7AVCNFSM6AAAAABCUGO6RWVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJRGY4TCMRWGM4DENI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

LinB203 · 2024-02-04T02:14:02Z

I used pretrained llava to initialize moe-llava, and passed in moe_layers_idx params, and encountered the following error. AssertionError: The model has moe layers, but None of the param groups are marked as MoE. Create a param group with 'moe' key set to True before creating optimizer Any additional modifications to solve this?
…
On Sat, Feb 3, 2024 at 10:32 AM lb203 @.> wrote: Closed #9 <#9> as completed. — Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMDRFRH2F3NAY6RCSAG5WALYRWOU7AVCNFSM6AAAAABCUGO6RWVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJRGY4TCMRWGM4DENI . You are receiving this because you authored the thread.Message ID: @.>

Here is the solution.
#17

DietDietDiet · 2024-02-05T07:25:55Z

I found it really weird that even I set minimal num_of experts & moe layers, moe-llama still cannot fit into 40G A100, here is the trainable modules I modified according to llama. --train_modules mlp.gate_proj mlp.up_proj mlp.down_proj wg \
Could u provide a sample script for the final moe stage for llava1.5?

LinB203 · 2024-02-05T11:28:25Z

I found it really weird that even I set minimal num_of experts & moe layers, moe-llama still cannot fit into 40G A100, here is the trainable modules I modified according to llama. --train_modules mlp.gate_proj mlp.up_proj mlp.down_proj wg \ Could u provide a sample script for the final moe stage for llava1.5?

You can enable the flash_attn2, and try it again. Refer to this issue.
#25 (comment)

Btw, how many GPUs you use?

DietDietDiet · 2024-02-06T07:23:51Z

modified
model = LlavaLlamaForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, config=lora_cfg_pretrained, attn_implementation="flash_attention_2", **kwargs)
in builder.py,
still OOM, I 'm using 8*40G A100.

LinB203 · 2024-02-06T08:34:09Z

Could you post your command?

modified model = LlavaLlamaForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, config=lora_cfg_pretrained, attn_implementation="flash_attention_2", **kwargs) in builder.py, still OOM, I 'm using 8*40G A100.

DietDietDiet · 2024-02-06T11:10:42Z

`moe_mode="sparse"
num_experts=1
top_k_experts=1
use_residual=False
router_aux_loss_coef=0.01
JSON_FOLDER="ft_json"
IMAGE_FOLDER="train_image_video"

HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 deepspeed moellava/train/train_mem.py
--moe_enable False --num_experts ${num_experts} --top_k_experts ${top_k_experts} --capacity_factor 1.5
--moe_layers_idx 0 5 10
--moe_mode ${moe_mode} --use_residual ${use_residual} --router_aux_loss_coef ${router_aux_loss_coef}
--train_modules mlp.gate_proj mlp.up_proj mlp.down_proj wg
--deepspeed ./scripts/zero2.json
--model_name_or_path $(pretrained llava weight)
--version v1
--per_device_train_batch_size 1
--per_device_eval_batch_size 16
--gradient_accumulation_steps 16 `

The rest remains consistent with llava

LinB203 · 2024-02-06T13:47:03Z

`moe_mode="sparse" num_experts=1 top_k_experts=1 use_residual=False router_aux_loss_coef=0.01 JSON_FOLDER="ft_json" IMAGE_FOLDER="train_image_video"

HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 deepspeed moellava/train/train_mem.py --moe_enable False --num_experts ${num_experts} --top_k_experts ${top_k_experts} --capacity_factor 1.5 --moe_layers_idx 0 5 10 --moe_mode ${moe_mode} --use_residual ${use_residual} --router_aux_loss_coef $r o u t e r_{a} u x_{l} o s s_{c} o e f - - t r a i n_{m} o d u l e s m l p . g a t e_{p} r o j m l p . u p_{p} r o j m l p . d o w n_{p} r o j w g - - d e e p s p e e d . / s c r i p t s / z e r o 2. j s o n - - m o d e l_{n} a m e_{o} r_{p} a t h$ (pretrained llava weight) --version v1 --per_device_train_batch_size 1 --per_device_eval_batch_size 16 --gradient_accumulation_steps 16 `

The rest remains consistent with llava

We will check it later. Could you try other model, such as phi or stablelm?

DietDietDiet · 2024-02-07T03:51:00Z

ok, but the point for me is to test the result for an extra moe stage for a trained model, so I am currently working on my trained llava.

…

On Tue, Feb 6, 2024 at 9:47 PM lb203 ***@***.***> wrote: `moe_mode="sparse" num_experts=1 top_k_experts=1 use_residual=False router_aux_loss_coef=0.01 JSON_FOLDER="ft_json" IMAGE_FOLDER="train_image_video" HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 deepspeed moellava/train/train_mem.py --moe_enable False --num_experts ${num_experts} --top_k_experts ${top_k_experts} --capacity_factor 1.5 --moe_layers_idx 0 5 10 --moe_mode ${moe_mode} --use_residual ${use_residual} --router_aux_loss_coef ${router_aux_loss_coef} --train_modules mlp.gate_proj mlp.up_proj mlp.down_proj wg --deepspeed ./scripts/zero2.json --model_name_or_path $(pretrained llava weight) --version v1 --per_device_train_batch_size 1 --per_device_eval_batch_size 16 --gradient_accumulation_steps 16 ` The rest remains consistent with llava We will check it later. Could you try other model, such as phi or stablelm? — Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMDRFRCUHB5XPHIXJX3A75DYSIX6HAVCNFSM6AAAAABCUGO6RWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRZGY3DKNRYGQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

LinB203 closed this as completed Feb 3, 2024

LinB203 reopened this Feb 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is llavallama moe supported? #9

Is llavallama moe supported? #9

DietDietDiet commented Feb 1, 2024

LinB203 commented Feb 1, 2024

DietDietDiet commented Feb 1, 2024 via email

LinB203 commented Feb 1, 2024

DietDietDiet commented Feb 2, 2024

LinB203 commented Feb 2, 2024

DietDietDiet commented Feb 4, 2024 via email

LinB203 commented Feb 4, 2024

DietDietDiet commented Feb 5, 2024

LinB203 commented Feb 5, 2024

DietDietDiet commented Feb 6, 2024

LinB203 commented Feb 6, 2024

DietDietDiet commented Feb 6, 2024

LinB203 commented Feb 6, 2024

DietDietDiet commented Feb 7, 2024 via email

Is llavallama moe supported? #9

Is llavallama moe supported? #9

Comments

DietDietDiet commented Feb 1, 2024

LinB203 commented Feb 1, 2024

DietDietDiet commented Feb 1, 2024 via email

LinB203 commented Feb 1, 2024

DietDietDiet commented Feb 2, 2024

LinB203 commented Feb 2, 2024

DietDietDiet commented Feb 4, 2024 via email

LinB203 commented Feb 4, 2024

DietDietDiet commented Feb 5, 2024

LinB203 commented Feb 5, 2024

DietDietDiet commented Feb 6, 2024

LinB203 commented Feb 6, 2024

DietDietDiet commented Feb 6, 2024

LinB203 commented Feb 6, 2024

DietDietDiet commented Feb 7, 2024 via email