Skip to content

Latest commit

 

History

History
59 lines (44 loc) · 1.38 KB

README.md

File metadata and controls

59 lines (44 loc) · 1.38 KB

TROVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks 🛠️

Setup

Install the required packages:

pip install -r requirements.txt

Tasks and datasets are organized as follows:

├── MATH
│   ├── algebra
│   ├── counting_and_probability
│   ├── geometry
│   ├── intermediate_algebra
│   ├── number_theory
│   ├── prealgebra
│   └── precalculus
├── TableQA
│   ├── TabMWP
│   ├── WTQ
│   └── HiTab
├── VQA
└── └── GQA

Running Experiments

Our Method: TroVE

python run_trove.py --task_name "math/algebra"
  • For MATH tasks, specify the task name as math/${dataset_name}, e.g., math/algebra.
  • For TableQA and VQA tasks, directly used the dataset name: [tabmwp, wtq, hitab, gqa].

Note that the specified --task_name argument should be lowercased.

Baseline Methods: Primitive & Instance

python baseline.py --task_name "math/algebra" --suffix "primitive"  # or "instance"

Note that for GQA dataset, we implement the locate_objects and visual_qa functions as fast apis. So you need to launch the server first (as below), then run the trove/baseline experiments.

uvicorn server.gqa:app

Evaluation

python -m utils.eval --results_path ${RESULTS_PATH}