Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question re. LanguageModel vs LanguageModelForCausalLM functionalies #101

Open
orrzohar opened this issue Jul 24, 2024 · 2 comments
Open

Comments

@orrzohar
Copy link

Hi,

Why did you refactor such that the model is of type 'LanguageModel' and not 'LanguageModelForCausalLM'? and why did you move 'get_vision_tower'/etc from 'LlavaMetaForCausalLM' to 'LlavaMetaModel'?

Best,
Orr

@Lyken17
Copy link
Collaborator

Lyken17 commented Jul 30, 2024

There are a lot of tech debates behind the refactor, and basically, the main reason we doing that is because we want so support different numerical precision in VILA (e..g, BF16 for LLM and FP16 for ViT). The old implementation (based on Llava) put every weights in a same place thus cannot achieve that.

@orrzohar
Copy link
Author

orrzohar commented Aug 3, 2024

@Lyken17,
Why did you want the encoders to be FP16 and not BF16? did you run any experiments and see a gap in performance? and didn't finetuning the encoders affect this?
Best,
Orr

gheinrich pushed a commit to gheinrich/VILA that referenced this issue Dec 16, 2024
* fix deepspeed unit test

* fix deepspeed

---------

Co-authored-by: Yao Lu <91149044+yaolug@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants