Name	Name	Last commit message	Last commit date
parent directory ..
experiments/cartpole	experiments/cartpole
README.md	README.md
agent.go	agent.go
memory.go	memory.go
policy.go	policy.go
policy_test.go	policy_test.go

Name

Last commit message

Last commit date

REINFORCE

An implementation of the REINFORCE algorithm.

How it works

REINFORCE is form of policy gradient that uses a Monte Carlo rollout to compute rewards. It accumulates the rewards for the entire episode and then discounts them weighting earier rewards heavier using the equation:

The gradient is computed by the softmax loss of the discounted rewards with respect to the episode states. Actions are then sampled from the softmax distribution.

Examples

See the experiments folder for example implementations.

Roadmap

more environments

References

Paper: http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf
Tutorial: https://www.freecodecamp.org/news/an-introduction-to-policy-gradients-with-cartpole-and-doom-495b5ef2207f
Tutorial: https://medium.com/@jonathan_hui/rl-policy-gradients-explained-9b13b688b146
Sample python code: https://github.com/rlcode/reinforcement-learning/blob/master/2-cartpole/3-reinforce/cartpole_reinforce.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reinforce

reinforce

README.md

REINFORCE

How it works

Examples

Roadmap

References

Files

reinforce

Directory actions

More options

Directory actions

More options

Latest commit

History

reinforce

Folders and files

parent directory

README.md

REINFORCE

How it works

Examples

Roadmap

References