LLaVA-OneVision for single-image, multi-image, and video #25

KevinH48264 · 2024-08-19T20:59:15Z

Thank you for the great work here!

LLaVA-OneVision seems to be the latest model from the LLaVA family of models that performs well out of open source LMMs. While it's still early, raising it here as others in the community may start to look at it.

https://llava-vl.github.io/blog/2024-08-05-llava-onevision/

zjysteven · 2024-08-19T21:01:41Z

Thanks for raising this. We've also noticed it, and including it into lmms-finetune is part of our recent plan.

zjysteven · 2024-08-20T14:31:54Z

We will wait a bit until the model implementation is merged into the main branch of transformers.
https://huggingface.co/llava-hf/llava-onevision-qwen2-0.5b-ov-hf/discussions/1
huggingface/transformers#32673

btekin · 2024-09-16T18:58:42Z

Are there any updates on this? It seems LLaVA-Onevision got merged into the main branch of transformers now: huggingface/transformers#32673

zjysteven · 2024-09-16T20:18:45Z

We will include it once we are free, probably in 2 - 3 weeks.

KevinH48264 closed this as completed Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLaVA-OneVision for single-image, multi-image, and video #25

LLaVA-OneVision for single-image, multi-image, and video #25

KevinH48264 commented Aug 19, 2024

zjysteven commented Aug 19, 2024 •

edited

Loading

zjysteven commented Aug 20, 2024

btekin commented Sep 16, 2024

zjysteven commented Sep 16, 2024

LLaVA-OneVision for single-image, multi-image, and video #25

LLaVA-OneVision for single-image, multi-image, and video #25

Comments

KevinH48264 commented Aug 19, 2024

zjysteven commented Aug 19, 2024 • edited Loading

zjysteven commented Aug 20, 2024

btekin commented Sep 16, 2024

zjysteven commented Sep 16, 2024

zjysteven commented Aug 19, 2024 •

edited

Loading