An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models (EMNLP 2024)

Datasets for "An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models". Our camera-ready copy is here.

Note: Please feel free to leave a GitHub issue or shoot me an email at fatimaa.shirii@gmail.com if you have any questions about the data, or if you'd just like to chat about this (or related) work!

Here's more information about the our benchmark, followed by instructions about how to use this repository to reproduce our results in the paper.

Benchmark

We introduce a novel benchmark, Spatial-MM, which includes two subsets: Spatial-Obj and Spatial-CoT. Spatial-Obj features multiple-choice questions that focus on the spatial relationships between one or two objects in an image, while Spatial-CoT offers open-ended multi-hop questions. Each subset is in a separate JSON file. The JSON files consist of the image ID and the list of answer options.

Setting Up

Clone the repository and create the data directory within it, where your data and models will live.

Downloading the data

The data all lives in spatial_mm/data.

You can also download the data directly from this Google Drive link.

Citation

If you use this code or data, please consider citing our paper:

@inproceedings{shiri2024empirical,
  title={An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models},
  author={Shiri, Fatemeh and Guo, Xiao-Yu and Far, Mona and Yu, Xin and Haf, Reza and Li, Yuan-Fang},
  booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
  pages={21440--21455},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models (EMNLP 2024)

Benchmark

Setting Up

Downloading the data

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models (EMNLP 2024)

Benchmark

Setting Up

Downloading the data

Citation