Official repository for Large Language Models Are Reasoning Teachers, by Namgyu Ho, Laura Schmid, and Se-young Yun.
🚀 Accepted to ACL 2023.
This repository contains code for (1) running CoT reasoning on OpenAI models, and (2) apply Fine-tune-CoT to train students based on OpenAI models or custom open-source models such as T5, Flan-T5, GPT-2 on your GPUs, based on 🤗 and Pytorch Lightning.
OpenAI API experiments are implemented in the oai
module. Refer to notebooks/example_oai_finetune_cot.ipynb
on how to run Fine-tune-CoT from start to finish.
Custom experiments are implemented in the custom
module, based on PyTorch Lightning. Refer to custom_train.py
and scripts/custom/*.sh
on how to fine-tune models such as T5, Flan-T5, and GPT-2 using Fine-tune-CoT.
pip install -r requirements.txt
python setup.py develop
The code has been tested on Python<=3.10, PyTorch Lightning<=1.9, PyTorch>=2.0
We're proud to share all of our raw experimental data! All data is organized in json or jsonl format, for your pleasure :)
Cloud storage folder links:
dataset.tar.gz
: 12 task datasets compiled in a unified json format- Belongs in
PROJECT/data/dataset/
- Belongs in
completion_data.tar.gz
: Completion data, i.e., inference data, from all teachers and students, for all experiments. About 8GB when uncompressed- Belongs in
PROJECT/saved/completion_data/
- Belongs in
teacher_completion_data.tar.gz
: Completion data from Zero-shot-CoT (with diverse reasoning) on the default teacher modeltext-davinci-002
using the OpenAI API. About 💰 $1000+ worth of goods, with ❤️ from OSI LAB at KAIST . Subset ofcompletion_data.tar.gz
.- Belongs in
PROJECT/saved/completion_data/
.
- Belongs in
finetune_data.tar.gz
: All data used to fine-tune OpenAI students via the fine-tuning API, in jsonl format. These are derived from teacher completion data and can be generated from our code.- Belongs in
PROJECT/saved/finetune_data/
- Belongs in
After downloading the full completion_data.tar.gz
, you can run notebooks/results.ipynb
to generate all result tables and figures from our paper. The code will (re-)evaluate all raw text model outputs contained in the completion data.
Template-based splits for MultiArith and Date Understanding are saved in /data/splits/*__template.json
Few-shot prompts adapted from Wei 2022 are saved in /data/few_shot_cot_prompts.json
{
"metadata": {
"dataset_key": "multiarith"
},
"data": [
{
"sample_index": 0,
"question": "string",
"answer": "string",
"rationale": "string?"
}
]
}
{
"metadata": {
"dataset_key": "multiarith",
"base_model": "curie",
"finetune_key": "zs_cot_multiarith",
"train_key": "ft_cot",
"prediction_template": "ft_cot_token",
},
"data": {
"<sample_index>": [
{
"sample_index": 0,
"completion_index": 0,
"question": "string",
"answer": "string",
"prompt": "string",
"completion": "string",
"finish_reason": "string",
"reasoning_prompt": "string?",
"reasoning_completion": "string?",
"reasoning_finish_reason": "string?",
}
]
}
}
Needs update.
<model_key>
=B_<base_model>_T_<train_key>
saved/
|–– completion_data/
|–– B_<BASE_MODEL>__C_<COMPLETION_KEY>/
|-- D_<DATESET_KEY>.json # base model inference
|-- F_<FINETUNE_KEY>__D_<DATESET_KEY>.json # default fine-tuned model inference
|-- F_<FINETUNE_KEY>__T_<TRAIN_KEY>__D_<DATESET_KEY>.json # custom fine-tuned model inference
|–– finetune_data/
|–– P_<PLATFORM_KEY>/
|–– F_<FINETUNE_KEY>{.*|/}
|–– model_metadata/
|–– B_<base_model>
|–– F_<FINETUNE_KEY>__T_<train_key>.json
saved/
|–– completion_data/
|–– B_text-davinci-002__C_zs_cot/
|–– B_text-davinci-002__C_zs_cot_long/
|–– B_text-davinci-002__C_fs_cot/
|–– B_curie__C_zs_cot/
|–– B_curie__C_fs_cot/
|–– B_curie__C_zs/
|–– B_curie__C_ft_cot/
|–– finetune_data/
|–– F_zs_cot_multiarith/ # text-davinci-002_zs_cot
|–– F_zs_cot_long_multiarith/
|–– model_metadata/
|–– B_curie/
|–– F_zs_cot_multiarith.json