FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model
FLEUR utilizes the LLaVA model for performing image caption evaluation (though you may use other Vision Language Models if desired). Please follow the instructions in the LLaVA GitHub README for the necessary setup. No additional training is required.
- Running code for FLEUR:
CUDA_VISIBLE_DEVICES=0,1 python fleur.py
- Running code for RefFLEUR:
CUDA_VISIBLE_DEVICES=0,1 python reffleur.py
CUDA_VISIBLE_DEVICES=0,1 python fleur_exp.py
The evaluation result will be saved as txt files in the results
folder.
Change file names of annotation file and the evaluation result file in compute_correlation.py
python compute_correlation.py