Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MoE for Vicuna #9

Open
mzamini92 opened this issue Oct 21, 2024 · 1 comment
Open

MoE for Vicuna #9

mzamini92 opened this issue Oct 21, 2024 · 1 comment

Comments

@mzamini92
Copy link

Thanks for the great work. I was wondering, you used MoE for Mixtral Model. have you actually used it for your model or it was implemented for testing? I see in the scripts you have scripts for llama and vicuna. can we use MoE for Vicuna as well?

        elif "mixtral" in model_args.model_name_or_path.lower():
            model = LlavaMixtralForCausalLM.from_pretrained(
                model_args.model_name_or_path,
                cache_dir=training_args.cache_dir,
                attn_implementation=attn_implementation,
                torch_dtype=(torch.bfloat16 if training_args.bf16 else None),
                **bnb_model_from_pretrained_args
            )
            from deepspeed.utils import set_z3_leaf_modules
            set_z3_leaf_modules(model, [MixtralSparseMoeBlock])

thanks.

@yfzhang114
Copy link
Owner

Sorry for the delayed reply. It is just for testing and we didn't incorporate Mixtral for training. However, in out opinion, the moe adapter should be able to be combined with any LLM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants