MoE for Vicuna #9

mzamini92 · 2024-10-21T21:02:06Z

Thanks for the great work. I was wondering, you used MoE for Mixtral Model. have you actually used it for your model or it was implemented for testing? I see in the scripts you have scripts for llama and vicuna. can we use MoE for Vicuna as well?

        elif "mixtral" in model_args.model_name_or_path.lower():
            model = LlavaMixtralForCausalLM.from_pretrained(
                model_args.model_name_or_path,
                cache_dir=training_args.cache_dir,
                attn_implementation=attn_implementation,
                torch_dtype=(torch.bfloat16 if training_args.bf16 else None),
                **bnb_model_from_pretrained_args
            )
            from deepspeed.utils import set_z3_leaf_modules
            set_z3_leaf_modules(model, [MixtralSparseMoeBlock])

thanks.

The text was updated successfully, but these errors were encountered:

yfzhang114 · 2024-10-30T11:28:09Z

Sorry for the delayed reply. It is just for testing and we didn't incorporate Mixtral for training. However, in out opinion, the moe adapter should be able to be combined with any LLM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MoE for Vicuna #9

MoE for Vicuna #9

mzamini92 commented Oct 21, 2024

yfzhang114 commented Oct 30, 2024

MoE for Vicuna #9

MoE for Vicuna #9

Comments

mzamini92 commented Oct 21, 2024

yfzhang114 commented Oct 30, 2024