Optimize a flying policy for a drone to pick up and deliver a package as quickly as possible when the system dynamics are subject to probabilistic uncertainties (Markov Decision Processes).
Project rated with full marks.
The drone operates in a discretized world of M x N cells and its dynamics at a certain time step are modelled through three state variables, namely its position on the x and y-axis and a binary variable telling if the UAV is carrying the package or not. At time step k the system evolves in the following way:
- One of the allowable control inputs uk is applied. In particular uk ∈ {North, South, East, West, Hover}. However the control input is not allowed if it leads ouside the world boundaries or against a tree (green cells); angry citizens are present scattered around the map and have the possibility to shoot and destroy the drone with a probability proportional to their distance from the drone.
- From the new desired position, a gust of wind can occur, moving the drone either North, South, East and West uniformly at random;
- The drone crashes if it ends up outside the grid world boundaries or against a tree, or if it is shooted by an angry citizen;
- Whenever a drone crashes it is brought to a base station carrying no package. The fixing procedure takes Nc time steps.
Further info: Assignment.pdf.
The drone starts at the base station (yellow cell) and needs to pickup the package at the pickup point (purple cell) and delivery it to the delivery point (blue cell). Find the policy minimizing the expected number of time steps needed to achieve the goal by using the following approaches for the Infinite Horizon problem:
- Value Iteration;
- Policy Iteration;
- Linear Programming.
Run the whole project from main.m.
Example with a random generated map:
source map (left) -> costs and policy up to the pickup (center) -> costs and policy for the delivery (right).