Implementations of deep reinforcement learning algorithms with Tensorflow Eager, such as:
- PPO with CLIP-objective and GAE
- Build intuitions by reimplementing the algorithms.
- Bring into existence the implementations compatible with Tensorflow's Eager mode and using functionality of the recent versions of Tensorflow instead of obscure implementations branching off older OpenAI code. Specifically:
- use TF summary writers instead of custom loggers;
- use TF's distributions instead of custom sampling and custom log-likelihood calculations;
- use TF datasets instead of custom batching code;
- use Keras models wherever possible.
- Build foundation for me to easily experiment with:
- building on the foundation of:
- inspired by:
CartPole-v0
:- PPO:
- solved: takes under 100K environment steps on average
- prefers higher advantage-lambda and gamma, yet isn't very sensitive to value-lambda (though see #15)
- PPO: