This is the code for the EMNLP 2022 Paper UniRPG: Unified Discrete Reasoning over Table and Text as Program Generation. Our code is based on the repository TAT-QA.
torch==1.7.1
transformers==3.3.0
fastNLP==0.6.0
allennlp==2.0.1
spacy==2.0.1
The folder UniRPG_full
is for training UniRPG with the derivation annotations while the folder UniRPG_weak
is for the setting without the derivation annotations.
First download pre-trained model BART and put files in the folder plm
, then you should run the following scripts to preprocess training/dev/test data.
bash scripts/prepare_data_train.sh
bash scripts/prepare_data_dev.sh
bash scripts/prepare_data_test.sh
The preprocessed train/dev/test data is stored in the folder tag_op/cache/
Under weak supervision setting, you should first run the following command to convert multi-span instances to count instances, and then run the above preprocessing scripts.
python3 tag_op/data/count_instance_construction.py
You should run the following scripts to train the UniRPG
bash scripts/train_bart_large.sh
The trained UniRPG model is saved in the folder checkpoint
First check the saved path of model in the following scripts and then run them to evaluate the trained model in dev set
bash scripts/validate.sh
bash scripts/execute.sh
bash scripts/eval.sh
Please check the saved path of model in the following scripts, and then predict the programs and execute them to get the answers of test instances.
bash scripts/predict.sh
bash scripts/execute.sh
@inproceedings{zhou-etal-2022-unirpg,
title = "{U}ni{RPG}: Unified Discrete Reasoning over Table and Text as Program Generation",
author = "Zhou, Yongwei and
Bao, Junwei and
Duan, Chaoqun and
Wu, Youzheng and
He, Xiaodong and
Zhao, Tiejun",
booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2022",
address = "Abu Dhabi, United Arab Emirates",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.emnlp-main.508",
pages = "7494--7507"
}