rl4lms library can be downloaded at : rl4lms-link.
necessary data is available at : data-link.
scripts/training/task_configs/
: training and evaluation arguments, model and output paths.
rl4f_scripts/
: script for supervised learning.
openai_key
: Please specify your API key. (we used gpt-3.5-turbo-instruct)
wandb_key
: Please specify your API key.
All scripts are under rl4f_scripts
. Specifically, rl4f_scripts/run_interscript_sup.sh
is the script for running supervised learning to pretrained T5-large to generate critique for interscript. For PPO training, run rl4f_scripts/run_alphabetize_ppo.sh
.
For future research, we have included an augmented version of the
MATH dataset. The augmented data
follows the format of the Alphabetization task. See augment_data.ipynb
for the exact augmentation method.
For more information, view the MATH_RL directory in this repository.