You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the great work. I was wondering, you used MoE for Mixtral Model. have you actually used it for your model or it was implemented for testing? I see in the scripts you have scripts for llama and vicuna. can we use MoE for Vicuna as well?
elif "mixtral" in model_args.model_name_or_path.lower():
model = LlavaMixtralForCausalLM.from_pretrained(
model_args.model_name_or_path,
cache_dir=training_args.cache_dir,
attn_implementation=attn_implementation,
torch_dtype=(torch.bfloat16 if training_args.bf16 else None),
**bnb_model_from_pretrained_args
)
from deepspeed.utils import set_z3_leaf_modules
set_z3_leaf_modules(model, [MixtralSparseMoeBlock])
thanks.
The text was updated successfully, but these errors were encountered:
Sorry for the delayed reply. It is just for testing and we didn't incorporate Mixtral for training. However, in out opinion, the moe adapter should be able to be combined with any LLM.
Thanks for the great work. I was wondering, you used MoE for Mixtral Model. have you actually used it for your model or it was implemented for testing? I see in the scripts you have scripts for llama and vicuna. can we use MoE for Vicuna as well?
thanks.
The text was updated successfully, but these errors were encountered: