Skip to content

lzil/procedural-generalization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Assessing Generalization in Reward Learning

To read the blog post writeup, click here.

Intro

We want to build scalable, aligned RL agents; one approach is with reward learning. The ideal reward learning algorithm demonstrates good generalization properties; here, we use the Procgen benchmark to test some of these algorithms.

Approach

So far, we investigated two different reward learning architectures:

There are others we can test, but given limited time we could only look at these two. We implemented PPO in Pytorch using the baselines package (upgrading to stable-baselines pending) by adapting them to allow for a custom reward learning function, which we would be able to learn.

Procgen has 16 different environments; based on some initial testing we decided to focus on 4 environments: coinrun, bigfish, starpilot, and fruitbot.

Code

An explanation for how to run the T-REX code is written in the trex repository.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages