You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Great work! I just have a question/doubt on using the <video> token in the video inference.
The constants defined don't seem to recognise any video token but only image tokens:
Am I missing something or does the model take <video> as just text context while the image token is added automatically N times i.e. length of frames to the query?
The text was updated successfully, but these errors were encountered:
Virtualexistence
changed the title
Seems video token isn't used in the model
Seems video token isn't used in the model during video inference
Jul 9, 2024
Great work! I just have a question/doubt on using the
<video>
token in the video inference.The constants defined don't seem to recognise any video token but only image tokens:
VILA/llava/constants.py
Line 27 in 0085724
Am I missing something or does the model take
<video>
as just text context while the image token is added automatically N times i.e. length of frames to the query?VILA/llava/eval/run_vila.py
Line 66 in 0085724
python -W ignore llava/eval/run_vila.py \ --model-path Efficient-Large-Model/VILA1.5-3b \ --conv-mode vicuna_v1 \ --query "<video>\n Please describe this video." \ --video-file "demo.mp4"
The text was updated successfully, but these errors were encountered: