Skip to content

Policy Development Task Specification

Kevin Bradner edited this page May 7, 2019 · 40 revisions

Introduction

oprc_env provides a reinforcement learning environment for the task of multi agent coverage by a swarm of UAVs. It does not include an agent to perform the reinforcement learning on this task (Note, however, that the oprc repository aims to supply such an agent).

The purpose of this document is to provide a specification of the interface exposed by the oprc_env modules, and the interface required of a reinforcement learning agent which is compatible with this environment. At this point, all code for the project is in Haskell, and all interfaces are defined in terms of Haskell modules. In the future, there are plans to support other popular frameworks for reinforcement learning. In particular, this environment may eventually be compatible with the popular OpenAI Gym.

Interaction Diagram

The most important piece of the environment-agent interface is the policy typeclass:

class Policy p where
  nextMove :: p -> WorldView -> NextActions

See: WorldView, NextActions

Note that the term 'policy' may refer either to an instance of the above typeclass or to the general reinforcement learning notion of a policy.

At minimum, an agent must have an instance of policy to interact with oprc_env.

Clone this wiki locally