1.Use CliffWalking-v0 from OpenAI gym:
- Create two agents to find the optimal policy using Policy Iteration and Value Iteration.
- Test-run and visualizing learning.
2.Use Taxi-v3 from OpenAI gym:
- Prepare and train your agent using i) On-Policy Monte Carlo and ii) Off-Policy Monte-Carlo using Important Sampling.
- Prepare and train two more agents using i) Q-Learning and ii) SARSA.
requirements_doc.pdf
gives more detailed explanation of the requirements and the scope of this repository.mc_qlearn_sarsa.ipynb
aims to implement- On-Policy Monte Carlo
- Off-Policy Monte Carlo+Importance Sampling
- Q-Learning
- SARSA
policy_iter_value_iter.ipynb
aims to implement- Policy Iteration
- Value Iteration