Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences

Xiyao Wang · Yuhang Zhou · Xiaoyu Liu · Hongjin Lu · Yuancheng Xu · Feihong He · Jaehong Yoon · Taixi Lu · Gedas Bertasius · Mohit Bansal · Huaxiu Yao* · Furong Huang*

📍 Dataset

This is the dataset and code for paper 'Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences'

All datas are at this google drive link: Menmentos Dataset

📄 Synonym graphs

We provide all object and behavior synonym files in 'sym_graphs' folder which can be loaded and used directly using function 'load_graph' in build_action_graph.ipynb.

📊 Evaluation

To evaluate your own model, we provide the codes of GPT-4-assisted evaluation procedure in GPT-4-assisted_evaluation.ipynb. First you need extract object and behavior keyword list using GPT-4, then compute Recall, Precision, and F1 of objects and behaviors.

📝 Citation

If you find our work useful, please consider citing:

@article{wang2024mementos,
  title={Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences},
  author={Wang, Xiyao and Zhou, Yuhang and Liu, Xiaoyu and Lu, Hongjin and Xu, Yuancheng and He, Feihong and Yoon, Jaehong and Lu, Taixi and Bertasius, Gedas and Bansal, Mohit and others},
  journal={arXiv preprint arXiv:2401.10529},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
sym_graphs		sym_graphs
GPT-4-assisted_evaluation.ipynb		GPT-4-assisted_evaluation.ipynb
README.md		README.md
cmc_description.csv		cmc_description.csv
dl_description.csv		dl_description.csv
robo_description.csv		robo_description.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences

📍 Dataset

📄 Synonym graphs

📊 Evaluation

📝 Citation

About

Releases

Packages

Languages

umd-huang-lab/Mementos

Folders and files

Latest commit

History

Repository files navigation

Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences

📍 Dataset

📄 Synonym graphs

📊 Evaluation

📝 Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages