Can not reproduce the results on MSVD-QA and TGIF-QA #197

Jingchensun · 2024-11-07T20:02:44Z

First, thank you for the amazing work.

I am using the checkpoint LanguageBind/Video-LLaVA-7B and have set do_sample=False and temperature=0.0 in run_inference_video_qa.py. For inference on the MSVD-QA dataset, I used 4 * A6000 GPUs, and the process took about one hour. However, when I evaluated the prediction results using GPT-3.5 (default setting), I only achieved an accuracy of 36.27% and a score of 2.87, which is significantly lower than the results reported in the paper. Similarly, on the TGIF-QA dataset, I obtained an accuracy of only 19.6% and a score of 2.4. For other evaluation tasks, such as VQA (e.g., VQAv2, GQA), my results perfectly matched those reported in the paper.

Could the authors provide feedback on the evaluation for the video-QA task? Is there an alternative evaluation method to using GPT?

with torch.inference_mode():
    output_ids = model.generate(
        input_ids,
        images=[video_tensor],
        do_sample=False,
        temperature=0.0,
        max_new_tokens=1024,
        use_cache=True,
        stopping_criteria=[stopping_criteria]
    )

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can not reproduce the results on MSVD-QA and TGIF-QA #197

Can not reproduce the results on MSVD-QA and TGIF-QA #197

Jingchensun commented Nov 7, 2024 •

edited

Loading

Can not reproduce the results on MSVD-QA and TGIF-QA #197

Can not reproduce the results on MSVD-QA and TGIF-QA #197

Comments

Jingchensun commented Nov 7, 2024 • edited Loading

Jingchensun commented Nov 7, 2024 •

edited

Loading