Data as Demonstrator (DaD) is a meta learning algorithm to improve the multi-step predictive capabilities of a learned time series (e.g. dynamical system) model. This method:
- Is simple, easy to implement and can wrap an existing time-series learning procedure.
- Makes no assumption on differentiability. This allows the algorithm to be utilized on top of a larger array of supervised learning algorithms (e.g. random forests & decision trees).
- Is data-efficient in improving a learned model. Without querying the actual system for additional training data, the method is able to achieve better performance on the multi-step criterion by reusing training data to correct for prediction mistakes.
- Can be shown to have performance garauntees that relate the one-step predictive error to the multi-step error.
This repository contains the implementation of the International Symposium on Experimental Robotics (ISER) 2016 paper:
Improved Learning of Dynamics for Control. Arun Venkatraman, Roberto Capobianco, Lerrel Pinto, Martial Hebert, Daniele Nardi, and J. Andrew Bagnell. ISER 2016.
DaD was originally presented at AAAI 2015:
Improving multi-step prediction of learned time series models. Arun Venkatraman, Martial Hebert, J. Andrew Bagnell. AAAI 2015.
The main code can be found in the DaD
folder. Currently, the primary file is DaD/dad_control.py
. It contains a class DadControl
which can be used to learn a multi-step predictive model for a controlled dynamical system. Some notes in using this:
DaDControl
requires alearner
object that has a.fit
and.predict
method that can take both states and controls. We provide wrappers for using sklearn learners inDaD.helpers.learner_wrapper
. The demo code has an example.- The states and controls are passed in as
numpy
tensors with dimensions[timesteps x dim x num_trajectories]
. - Passing in
Xtest
andUtest
is recommended and should be used as a validation dataset. Since there is no monotonic improvement garauntee with DaD, the algorithm tracks the best performance onXtest
and returns that model. See the demo code for an example on how to split the data.
NOTE: To use
DaDControl
for time-series problems without controls, one could pass a tensor of zeros forUtrain
&Utest
. The code should also be easily modified to remove the controls arguments.
A simple demo is provided in the demos
folder. The demo tries to learn the dynamics of a cartpole being controlled by a randomly generated linear control policy. It can be run by calling
python demos/learn_control_demo.py
Example results:
DaD (iters:25). Initial Err: 3.727, Best: 3.211
Err without DaD: 3.175, Err with DaD: 2.484
where the Err
shown corresponds to the RMS error for the multi-step prediction.
Using a more powerful learner can get us better results.
from sklearn.neural_network import MLPRegressor
learner = learner_wrapper.DynamicsControlDeltaWrapper(MLPRegressor(hidden_layer_sizes=(20, 10),
activation='tanh', alpha=1e-3, max_iter=int(1e4), warm_start=False))
Using a two-layer network like this, can give us results:
DaD (iters:25). Initial Err: 4.096, Best: 2.069
Err without DaD: 5.534, Err with DaD: 1.983
As we can see, the error is a bit lower. Possibly increasing the model complexity or the number of iterations can improve this result.
NOTE: As of August 2016, this requires the latest
sklearn
to get access to the MLPRegressor. This can be installed usingpip install git+https://github.com/scikit-learn/scikit-learn.git
, though it is recommended to do this in a virtualenv to prevent overwriting the release installation.