Skip to content

Policy Development Task Specification

Kevin Bradner edited this page May 7, 2019 · 40 revisions

Introduction

oprc_env provides a reinforcement learning environment for the task of multi agent coverage by a swarm of UAVs. It does not include an agent to perform the reinforcement learning on this task (Note, however, that the oprc repository aims to supply such an agent).

The purpose of this document is to provide a specification of the interface exposed by the oprc_env modules, and the interface required of a reinforcement learning agent which is compatible with this environment. At this point, all code for the project is in Haskell, and all interfaces are defined in terms of Haskell modules. In the future, there are plans to support other popular frameworks for reinforcement learning. In particular, this environment may eventually be compatible with the popular OpenAI Gym.

Quick Task Overview

In this environment, a reinforcement learning agent is tasked with observing the entirety of a two dimensional search space (representing an area of land with varied terrain) using a team of drones.

Each drone may be controlled independently, and observes subsets of the terrain directly below it. Drones may fly at a high altitude, in which case they can observe a large area of land in low detail. Drones may also fly low, observing a smaller portion of the search space in high detail. A drone may also ascend and descend to swap between these two altitudes. Finally, drones may of course move horizontally, so that they may observe new patches of the search space.

The Policy Typeclass

The most important piece of the environment-agent interface is the policy typeclass:

class Policy p where
  nextMove :: p -> WorldView -> NextActions

See: WorldView, NextActions

Note that the term 'policy' may refer either to an instance of the above typeclass or to the general reinforcement learning notion of a policy.

At minimum, an agent must have an instance of policy to interact with oprc_env. To understand what a reasonable instance of policy looks like, it will help to describe the task - and the data structures used to represent it's many parts - in more detail.

The Environment

A fundamental data structure in this project is the Environment. Environments can be thought of as a

type Environment = Map.Map Position Patch

As the datatype definition suggests, the

This reinforcement learning task terminates when

Clone this wiki locally