About

Implementations of deep reinforcement learning algorithms with Tensorflow Eager, such as:

Goals

Build intuitions by reimplementing the algorithms.
Bring into existence the implementations compatible with Tensorflow's Eager mode and using functionality of the recent versions of Tensorflow instead of obscure implementations branching off older OpenAI code. Specifically:
- use TF summary writers instead of custom loggers;
- use TF's distributions instead of custom sampling and custom log-likelihood calculations;
- use TF datasets instead of custom batching code;
- use Keras models wherever possible.
Build foundation for me to easily experiment with:
- MAML
- DeepMimic-style of reward-shaping using human-provided trajectories

CartPole-v0:
- PPO:
  - solved: takes under 100K environment steps on average
  - prefers higher advantage-lambda and gamma, yet isn't very sensitive to value-lambda (though see #15)

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
tests		tests
.gitignore		.gitignore
Hparams Sensitivity.ipynb		Hparams Sensitivity.ipynb
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
RepeatCopy-PPO.py		RepeatCopy-PPO.py
cartpole_ppo_tf.py		cartpole_ppo_tf.py
ppo.py		ppo.py