Author: Manas Vashistha
- Assignment for the course CS747 Foundations of Intelligent and Learning Agents under Prof Shivaram Kalyanakrishnan.
Estimating average regrets for Multiarmed Bandit Instances using
- Computing the optimal value functions for MDPs using Value iteration, Linear Programming and Policy improvement.
- Formulating a maze as an mdp to find the shortest path from a starting state to an end state.
Windy Gridworld task.