Skip to content

Latest commit

 

History

History

reinforce

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

REINFORCE

An implementation of the REINFORCE algorithm.

How it works

REINFORCE is form of policy gradient that uses a Monte Carlo rollout to compute rewards. It accumulates the rewards for the entire episode and then discounts them weighting earier rewards heavier using the equation: reward discount

The gradient is computed by the softmax loss of the discounted rewards with respect to the episode states. Actions are then sampled from the softmax distribution.

full equation

Examples

See the experiments folder for example implementations.

Roadmap

  • more environments

References