You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have seen from a previous issue, that it was able to reason among multiple images (see: #20)
I wanted to try this with vila-infer aswell, however, if I use the following input:
--text " Is a image of a man with tattoos, Is a image of a landscape, Is"
I get the warning and as ouput "1":
Media token '' found in text: ' Is a image of a man with tattoos, Is a image of a landscape, Is'. Removed.
So I was wondering if vila-infer is able to reasong among multiple images and if so, how do I need to change the text.
The text was updated successfully, but these errors were encountered:
I have seen from a previous issue, that it was able to reason among multiple images (see: #20)
I wanted to try this with vila-infer aswell, however, if I use the following input:
--text " Is a image of a man with tattoos, Is a image of a landscape, Is"
I get the warning and as ouput "1":
Media token '' found in text: ' Is a image of a man with tattoos, Is a image of a landscape, Is'. Removed.
So I was wondering if vila-infer is able to reasong among multiple images and if so, how do I need to change the text.
The text was updated successfully, but these errors were encountered: