CS747-FILA

Author: Manas Vashistha

Assignment for the course CS747 Foundations of Intelligent and Learning Agents under Prof Shivaram Kalyanakrishnan.

Assignment 1

Estimating average regrets for Multiarmed Bandit Instances using $\epsilon$-greedy, UCB, KL-UCB and Thompson Sampling.

Computing the optimal value functions for MDPs using Value iteration, Linear Programming and Policy improvement.
Formulating a maze as an mdp to find the shortest path from a starting state to an end state.

Windy Gridworld task.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Assignment1		Assignment1
Assignment2		Assignment2
Assignment3		Assignment3
.DS_Store		.DS_Store
README.md		README.md