Question re. LanguageModel vs LanguageModelForCausalLM functionalies #101

orrzohar · 2024-07-24T21:45:48Z

Hi,

Why did you refactor such that the model is of type 'LanguageModel' and not 'LanguageModelForCausalLM'? and why did you move 'get_vision_tower'/etc from 'LlavaMetaForCausalLM' to 'LlavaMetaModel'?

Best,
Orr

Lyken17 · 2024-07-30T06:15:47Z

There are a lot of tech debates behind the refactor, and basically, the main reason we doing that is because we want so support different numerical precision in VILA (e..g, BF16 for LLM and FP16 for ViT). The old implementation (based on Llava) put every weights in a same place thus cannot achieve that.

orrzohar · 2024-08-03T19:25:17Z

@Lyken17,
Why did you want the encoders to be FP16 and not BF16? did you run any experiments and see a gap in performance? and didn't finetuning the encoders affect this?
Best,
Orr

* fix deepspeed unit test * fix deepspeed --------- Co-authored-by: Yao Lu <91149044+yaolug@users.noreply.github.com>

gheinrich pushed a commit to gheinrich/VILA that referenced this issue Dec 16, 2024

fix deepspeed unit test (NVlabs#101)

fb77b31

* fix deepspeed unit test * fix deepspeed --------- Co-authored-by: Yao Lu <91149044+yaolug@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question re. LanguageModel vs LanguageModelForCausalLM functionalies #101

Question re. LanguageModel vs LanguageModelForCausalLM functionalies #101

orrzohar commented Jul 24, 2024

Lyken17 commented Jul 30, 2024

orrzohar commented Aug 3, 2024

Question re. LanguageModel vs LanguageModelForCausalLM functionalies #101

Question re. LanguageModel vs LanguageModelForCausalLM functionalies #101

Comments

orrzohar commented Jul 24, 2024

Lyken17 commented Jul 30, 2024

orrzohar commented Aug 3, 2024