This repository provides tools for evaluating image and video captioning using G-VEval. The evaluation includes calculating correlations with human scores for various datasets such as Flickr-8k-expert, Flickr-8k-CF, and MSVD-Eval.
- demo.py: Demonstrates a sample run of G-VEval for image and video captioning evaluation.
- correlation.py: Calculates the correlation with human scores for the Flickr-8k-expert, Flickr-8k-CF, and MSVD-Eval datasets.
- dataset_check.py: Checks if the datasets are correctly installed.
-
Create a Data Directory: Create a folder named
/data
in the root directory of the project. -
Download and Extract Datasets:
- For MSVD original videos, download and extract the dataset from YouTubeClips.tar into the
/data
directory. - For Flickr8k datasets, download the dataset from this link and place it in the
/data
directory.
- For MSVD original videos, download and extract the dataset from YouTubeClips.tar into the
-
Add OpenAI API Key: Add your OpenAI API key in the
.env
file located in the root directory of the project:OPENAI_API_KEY='your-api-key-here'
-
Human ACCR Scores: The human ACCR scores for MSVD-Eval are already provided in the
MSVD-Eval.json
file.
The demo.py
file demonstrates a sample run of G-VEval for image and video captioning evaluation.
Use the dataset_check.py
file to verify if the datasets are correctly installed.