Skip to content

Latest commit

 

History

History
29 lines (23 loc) · 2.26 KB

README.md

File metadata and controls

29 lines (23 loc) · 2.26 KB

MPPI implementation with the OpenAI gym pendulum environment

This repository implements Model Predictive Path Integral (MPPI) as introduced by the paper Information Theoretic MPC for Model-Based Reinforcement Learning by (Williams et al., 2017) and takes as forward model the pendulum OpenAI Gym environment.

Requirements

  • OpenAI Gym
  • numpy

Gists of the paper

The paper derives an optimal control law as a (noise-) weighted average over sampled trajectories. In particular, the optimization problem is posed to compute the control input such that the controlled distribution Q is pushed as close as possible to the optimal distribution Q*. This corresponds to minimizing the KL divergence between Q and Q*.

The gists from the paper:

  • the noise assumption vt ̴ N(ut, ∑) stems from noise in low-level controllers

  • the noise term can be pulled out of the Monte-Carlo approximation (η) equation and neatly interpreted as a weight for the MC samples in the iterative update law

  • given the optimal control input distribution Q*, it is derived u*t = ∫q*(V)vtdV

  • computing the integral is not possible since q* is unknown, instead importance sampling is used to sample from the proposal distribution:

    where can be approximated by the Monte-Carlo estimate given in algorithm 2 as η, yielding:

    which resembles an iterative procedure to improve the MC estimate by using a more accurate importance sampler