This repository implements Model Predictive Path Integral (MPPI) as introduced by the paper Information Theoretic MPC for Model-Based Reinforcement Learning by (Williams et al., 2017) and takes as forward model the pendulum OpenAI Gym environment.
- OpenAI Gym
- numpy
The paper derives an optimal control law as a (noise-) weighted average over sampled trajectories. In particular, the optimization problem is posed to compute the control input such that the controlled distribution Q is pushed as close as possible to the optimal distribution Q*. This corresponds to minimizing the KL divergence between Q and Q*.
The gists from the paper:
-
the noise assumption vt ̴ N(ut, ∑) stems from noise in low-level controllers
-
the noise term can be pulled out of the Monte-Carlo approximation (η) equation and neatly interpreted as a weight for the MC samples in the iterative update law
-
given the optimal control input distribution Q*, it is derived u*t = ∫q*(V)vtdV
-
computing the integral is not possible since q* is unknown, instead importance sampling is used to sample from the proposal distribution:
where can be approximated by the Monte-Carlo estimate given in algorithm 2 as η, yielding:
which resembles an iterative procedure to improve the MC estimate by using a more accurate importance sampler