RL algorithms and concepts implementation:
- Policy Gradient Methods
- REINFORCE algorithm with MLP
- Actor-Critic Method
- Semi-Gradient TD Lamda with Eligibility Traces
- Function Approximation: Linear and Non-Linear (Deep Neural Networks)
- n-step SARSA: On-Policy TD Control
- n-step Q-learning with Function Approximation
- n-step Expected SARSA with Function Approximation
- Baird’s counterexample: Semi-gradient Off-Policy TD(0) to demonstrate off-policy divergence
- Experience Replay to stabilize training
- Multi-armed Bandit Problem
- Boltzmann (Softmax)
- UCB
- Thomson sampling
- E-Greedy
- Policy Iteration
- Value Iteration
- Q-learning: Off-Policy TD Control
- Continuos Random Walk
- Sparse Coarse Coding