Skip to content

FatemehShiri/Spatial-MM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models (EMNLP 2024)

Datasets for "An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models". Our camera-ready copy is here.

Note: Please feel free to leave a GitHub issue or shoot me an email at fatimaa.shirii@gmail.com if you have any questions about the data, or if you'd just like to chat about this (or related) work!

Here's more information about the our benchmark, followed by instructions about how to use this repository to reproduce our results in the paper.

Benchmark

We introduce a novel benchmark, Spatial-MM, which includes two subsets: Spatial-Obj and Spatial-CoT. Spatial-Obj features multiple-choice questions that focus on the spatial relationships between one or two objects in an image, while Spatial-CoT offers open-ended multi-hop questions. Each subset is in a separate JSON file. The JSON files consist of the image ID and the list of answer options.

Setting Up

Clone the repository and create the data directory within it, where your data and models will live.

Downloading the data

The data all lives in spatial_mm/data.

You can also download the data directly from this Google Drive link.

Citation

If you use this code or data, please consider citing our paper:

@inproceedings{shiri2024empirical,
  title={An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models},
  author={Shiri, Fatemeh and Guo, Xiao-Yu and Far, Mona and Yu, Xin and Haf, Reza and Li, Yuan-Fang},
  booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
  pages={21440--21455},
  year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published