You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Large language models (LLMs) have exhibited impressive performance inlanguage comprehension and various reasoning tasks. However, their abilities inspatial reasoning, a crucial aspect of human cognition, remain relativelyunexplored. Human possess a remarkable ability to create mental images ofunseen objects and actions through a process known as \textbf{the Mind's Eye},enabling the imagination of the unseen world. Inspired by this cognitivecapacity, we propose Visualization-of-Thought (\textbf{VoT}) prompting. VoTaims to elicit spatial reasoning of LLMs by visualizing their reasoning traces,thereby guiding subsequent reasoning steps. We employed VoT for multi-hopspatial reasoning tasks, including natural language navigation, visualnavigation, and visual tiling in 2D grid worlds. Experimental resultsdemonstrated that VoT significantly enhances the spatial reasoning abilities ofLLMs. Notably, VoT outperformed existing multimodal large language models(MLLMs) in these tasks. While VoT works surprisingly well on LLMs, the abilityto generate \textit{mental images} to facilitate spatial reasoning resemblesthe mind's eye process, suggesting its potential viability in MLLMs.
AkihikoWatanabe
changed the title
あ
Visualization-of-Thought Elicits Spatial Reasoning in Large Language
Models, Wenshan Wu+, N/A, arXiv'24
Apr 8, 2024
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)
The text was updated successfully, but these errors were encountered: