Continual Learning of Critics (CS672 Course Project)

Introduction to the Codebase

rl4lms library can be downloaded at : rl4lms-link.

necessary data is available at : data-link.

scripts/training/task_configs/: training and evaluation arguments, model and output paths.

rl4f_scripts/: script for supervised learning.

openai_key: Please specify your API key. (we used gpt-3.5-turbo-instruct)

wandb_key: Please specify your API key.

Running Experiments

All scripts are under rl4f_scripts. Specifically, rl4f_scripts/run_interscript_sup.sh is the script for running supervised learning to pretrained T5-large to generate critique for interscript. For PPO training, run rl4f_scripts/run_alphabetize_ppo.sh.

Augmented MATH Dataset

For future research, we have included an augmented version of the MATH dataset. The augmented data follows the format of the Alphabetization task. See augment_data.ipynb for the exact augmentation method. For more information, view the MATH_RL directory in this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
MATH_RL		MATH_RL
data		data
feedback/data		feedback/data
rl4f_scripts		rl4f_scripts
rl4lms		rl4lms
scripts/training		scripts/training
.gitignore		.gitignore
README.md		README.md
augment_data.ipynb		augment_data.ipynb
custom_reward.py		custom_reward.py
download.py		download.py
myutil.py		myutil.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Continual Learning of Critics (CS672 Course Project)

Introduction to the Codebase

Running Experiments

Augmented MATH Dataset

About

Releases

Packages

Contributors 2

Languages

psmiz/KAIST_CS672_Project

Folders and files

Latest commit

History

Repository files navigation

Continual Learning of Critics (CS672 Course Project)

Introduction to the Codebase

Running Experiments

Augmented MATH Dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages