Intervideo2 for long video #197

pritamqu · 2024-10-11T05:22:29Z

Could you please confirm which video LLM is most suited for long videos from this list: https://huggingface.co/collections/OpenGVLab/internvideo2-6618ccb574bd2f91410df5cd

my guess is InternVideo2-Chat-8B-InternLM as the LLM has higher context.
Also, what is the maximum supported number of frames?

leexinhao · 2024-10-11T05:26:38Z

In theory, we can input a fairly long video (at least above the hour level), because we compress the video token into 96 before entering the llm, in practice, I recommend you to modify the code, divide a long video into multiple short video processing will get better results, for example, for a 64s video, we will divide it into 8 segments and send it to 8x96 tokens. We will update our long video version -VideoChat-NeXT - within a month, so stay tuned

pritamqu · 2024-10-11T05:31:47Z

do you suggest segmenting and concatenating video embeddings like this just for inference, even if the model has not been trained in a similar fashion?

choyakawa · 2024-11-13T14:10:34Z

We will update our long video version -VideoChat-NeXT - within a month, so stay tuned

any news?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intervideo2 for long video #197

Intervideo2 for long video #197

pritamqu commented Oct 11, 2024

leexinhao commented Oct 11, 2024

pritamqu commented Oct 11, 2024

choyakawa commented Nov 13, 2024

Intervideo2 for long video #197

Intervideo2 for long video #197

Comments

pritamqu commented Oct 11, 2024

leexinhao commented Oct 11, 2024

pritamqu commented Oct 11, 2024

choyakawa commented Nov 13, 2024