Skip to content
This repository has been archived by the owner on Feb 1, 2025. It is now read-only.

facebookresearch/dual-system-for-visual-language-reasoning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOMINO: A Dual-System for Multi-step Visual Language Reasoning

This is a Pytorch implementation for DOMINO: A Dual-System for Multi-step Visual Language Reasoning.

TL;DR: We propose a dual-system for multi-step visual language reasoning called DOMINO which outperforms existing models on challenging chart question answering datasets.

show

DOMINO alternates between System-2 (a prompted LLM) and System-1 (a visual encoder-text decoder) to answer complex questions over charts. The text in blue callouts are generated by System-2. The text in green callouts are generated by System-1 and appended to the generation sequence of System-2 directly. The chart and the question are from ChartQA (Masry et al., 2022).

Code folders

(1) system1-vision: Fine-tuning and inference with the vision module.

(2) system2-lm: Prompting LM for solving downstream tasks.

Dependencies

  • Python >= 3.6
  • PyTorch == 1.12.1
  • transformers == 4.29.2
  • fairscale == 0.4.6
  • sentencepiece == 0.1.99

Data

We used the following datasets:

Fine-tuning a vision module for visual information extraction

cd system1-vision
sbatch ./scripts/finetune_deplot.sh <HOME_DIR>

After training, the checkpoint of the vision module is saved to $VISION_CHECKPOINT='HOME_DIR/outputs/checkpoint' for later use.

Prompting LM for downstream tasks

The scripts for different tasks are stored at system2-lm/scripts. To run the script,

cd system2-lm
./script/run_dualsys_chartQA.sh <HOME_DIR>

License

The code is CC-BY-NC 4.0 licensed, as found in the LICENSE file.

Citation

Please cite our paper if DOMINO is used in your work:

@misc{wang2023domino,
      title={DOMINO: A Dual-System for Multi-step Visual Language Reasoning}, 
      author={Peifeng Wang and Olga Golovneca and Armen Aghajanyan and Xiang Ren and Muhao Chen and Asli Celikyilmaz and Maryam Fazel-Zarandi},
      year={2023},
      eprint={},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

Github repo for Peifeng's internship project

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •