What is the conv_mode for VILA1.5-40b in video inference? #145

stdKonjac · 2024-11-03T14:58:03Z

Hi, I wonder what is the conv_mode for VILA1.5-40b in video inference?
Additionally, I noted that the <video> token seems invalid in video inference. The eval codes will automatically add several tokens while keeping the <video> token untouched. For example:

<image>
<image>
<image>
<video>
Please describe the video

Is this behavior normal?
I'll be appreciated for your timely response :) @Lyken17

The text was updated successfully, but these errors were encountered:

Lyken17 · 2024-11-19T14:04:02Z

hermes-2

danigarciaoca · 2024-12-12T18:42:14Z

Hi @stdKonjac! Similar question in #87

gheinrich pushed a commit to gheinrich/VILA that referenced this issue Dec 16, 2024

Support W&B resume (NVlabs#145)

d21e90a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the conv_mode for VILA1.5-40b in video inference? #145

What is the conv_mode for VILA1.5-40b in video inference? #145

stdKonjac commented Nov 3, 2024

Lyken17 commented Nov 19, 2024

danigarciaoca commented Dec 12, 2024

What is the conv_mode for VILA1.5-40b in video inference? #145

What is the conv_mode for VILA1.5-40b in video inference? #145

Comments

stdKonjac commented Nov 3, 2024

Lyken17 commented Nov 19, 2024

danigarciaoca commented Dec 12, 2024