You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why did you refactor such that the model is of type 'LanguageModel' and not 'LanguageModelForCausalLM'? and why did you move 'get_vision_tower'/etc from 'LlavaMetaForCausalLM' to 'LlavaMetaModel'?
Best,
Orr
The text was updated successfully, but these errors were encountered:
There are a lot of tech debates behind the refactor, and basically, the main reason we doing that is because we want so support different numerical precision in VILA (e..g, BF16 for LLM and FP16 for ViT). The old implementation (based on Llava) put every weights in a same place thus cannot achieve that.
@Lyken17,
Why did you want the encoders to be FP16 and not BF16? did you run any experiments and see a gap in performance? and didn't finetuning the encoders affect this?
Best,
Orr
Hi,
Why did you refactor such that the model is of type 'LanguageModel' and not 'LanguageModelForCausalLM'? and why did you move 'get_vision_tower'/etc from 'LlavaMetaForCausalLM' to 'LlavaMetaModel'?
Best,
Orr
The text was updated successfully, but these errors were encountered: