Bayesian Inverse Reinforcemen Learning

Enviornment is the figure1 in the birl paper

Tested on

python==3.7.0
numpy==1.15.1
scipy==1.1.0
tqdm==4.26.0
matplotlib==2.2.3

python src/birl.py

Sampled rewards for each states.
An optimal policy for mean of sampled rewards were exactly matched with the expert's policy.