Use vila-infer to reason among multiple images #168

Hetznero · 2024-12-20T11:55:07Z

I have seen from a previous issue, that it was able to reason among multiple images (see: #20)

I wanted to try this with vila-infer aswell, however, if I use the following input:
--text " Is a image of a man with tattoos, Is a image of a landscape, Is"

I get the warning and as ouput "1":
Media token '' found in text: ' Is a image of a man with tattoos, Is a image of a landscape, Is'. Removed.

So I was wondering if vila-infer is able to reasong among multiple images and if so, how do I need to change the text.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use vila-infer to reason among multiple images #168

Use vila-infer to reason among multiple images #168

Hetznero commented Dec 20, 2024

Use vila-infer to reason among multiple images #168

Use vila-infer to reason among multiple images #168

Comments

Hetznero commented Dec 20, 2024