RL-Maze-Solver

This repository contains the implementation and report for the first coursework of the Reinforcement Learning (RL) module at Imperial College London. The project focuses on solving a Maze environment modeled as a Markov Decision Process (MDP) using various RL techniques.

Coursework Overview

The coursework involves implementing and evaluating agents using:

Dynamic Programming: Solving the maze with full knowledge of the transition matrix and reward function.
Monte Carlo Methods: Learning the optimal policy through episodic sampling without knowledge of the environment dynamics.
Temporal Difference Learning: Using bootstrapping techniques to approximate value functions and policies.

The Maze environment consists of absorbing states with varying rewards, obstacles, and stochastic transitions.

Project Structure

Coursework1.ipynb: The main Jupyter Notebook containing the implementation and experiments for the coursework tasks.
coursework1.py: The Python script exported from the notebook for submission and auto-marking.
coursework1_report.pdf: The report containing detailed explanations, results, and analysis.

Features

Dynamic Programming Agent: Implements policy evaluation and improvement to solve the Maze environment.
Monte Carlo Agent: Uses first-visit MC control to find an optimal policy by sampling episodes.
Temporal Difference Agent: Implements TD learning (SARSA/Q-learning) for incremental policy optimization.
Visualization: Generates policy and value function visualizations for analysis.
Learning Curves: Plots the learning performance of agents across multiple training runs.

Requirements

The following Python packages are required to run the notebook:

numpy
matplotlib
seaborn
jupyter

To install dependencies, run:

pip install -r requirements.txt

Running the Code

Clone the repository:

git clone https://github.com/your-username/RL-Maze-Solver.git
cd RL-Maze-Solver

Open the Jupyter Notebook:
```
jupyter notebook Coursework1.ipynb
```
Run the notebook cells sequentially to:
- Define the environment.
- Implement the agents.
- Visualize results.

Results

Dynamic Programming: Achieved optimal policies and value functions with known environment dynamics.
Monte Carlo Methods: Successfully learned policies through episodic sampling.
Temporal Difference Learning: Efficiently approximated policies using TD techniques.

Detailed results and analyses are available in the coursework1_report.pdf.

License

This project is for academic purposes and follows the coursework submission guidelines of Imperial College London. Please do not directly reuse the code for academic submissions without proper attribution.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
70028_1_spec.pdf		70028_1_spec.pdf
Coursework1.ipynb		Coursework1.ipynb
Coursework1.py		Coursework1.py
Coursework1_Report.pdf		Coursework1_Report.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RL-Maze-Solver

Coursework Overview

Project Structure

Features

Requirements

Running the Code

Results

License

About

Releases

Packages

Languages

leilibrk/RL-Maze-Solver

Folders and files

Latest commit

History

Repository files navigation

RL-Maze-Solver

Coursework Overview

Project Structure

Features

Requirements

Running the Code

Results

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages