Skip to content

Reward Modeling from Human Preferences and Advantage Actor-Critic Reinforcement Learning: A Reproducibility Study

Notifications You must be signed in to change notification settings

ianrandman/Reward-Modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reward-Modeling-Human-Preferences-Actor-Critic

David Dunlap and Ian Randman

Reward Modeling from Human Preferences and Advantage Actor-Critic Reinforcement Learning: A Reproducibility Study

Some dependencies are specific to the OS. Please review the list below to see the list of appropriate commands. Must install Anaconda and ffmpeg.

brew cask install anaconda (for MacOS)
conda create -n reward-modeling python=3.7 pip
pip install flask
pip install keras
pip install scikit-learn
pip install tensorflow==1.15
pip install gym
pip install gym[atari] OR pip install git+https://github.com/Kojoley/atari-py.git (for Windows) 
conda install -c conda-forge swig
pip install matplotlib
pip install Box2D
brew install ffmpeg (for MacOS)

The entry point to the program is in main.py. To change which environments are tested, modify the env_lst list. To change some overall specifics about the run, modify the parameters of TrainingSystem in main.py. The record parameter specifies if the runs are recorded (mandatory for giving human feedback), use_reward_model specifies whether to use the learned reward model or the OpenAI provided reward function, and load_model specifies whether or not to load a saved model file.

To give feedback to the model, run the program and go to localhost:5000.

About

Reward Modeling from Human Preferences and Advantage Actor-Critic Reinforcement Learning: A Reproducibility Study

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published