Learn how to sail safely in dangerous waters to your goal by creating a generation based reinforcement learning model using Q-Learning or SARSA.
There are three types of tiles in the deterministic environment:
no waves
- every action moves the boat by 0 additional fieldsyellow waves
- every action moves the boat by 1 additional fieldred waves
- every action moves the boat by 2 additional fields
When you use the stochastic environment you can choose the probability of the waves.
You can configure:
- learning algorithm
- learning rate (alpha)
- discount factor (gamma)
- number of generations
- exploration and explotation proportion
- value of positive/negative rewards
- environment type (stochastic or deterministic)
The program automatically generates, shows and exports some related statistics.
A .csv file showing the q-value of every action (columns) for each episode (rows).
This program was developed as part of a university project.