diff --git a/docs/algorithms/sqil.rst b/docs/algorithms/sqil.rst index b86e340a6..d680ba3af 100644 --- a/docs/algorithms/sqil.rst +++ b/docs/algorithms/sqil.rst @@ -1,10 +1,16 @@ .. _soft q imitation learning docs: -======================= +================================ Soft Q Imitation Learning (SQIL) -======================= +================================ - +Soft Q Imitation learning learns to imitate a policy from demonstrations by +using the DQN algorithm with modified rewards. During each policy update, half +of the batch is sampled from the demonstrations and half is sampled from the +environment. Expert demonstrations are assigned a reward of 1, and the +environment is assigned a reward of 0. This encourages the policy to imitate +the demonstrations, and to simultaneously avoid states not seen in the +demonstrations. Example ======= diff --git a/docs/index.rst b/docs/index.rst index 0c516c58b..204836d61 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -60,6 +60,7 @@ If you use ``imitation`` in your research project, please cite our paper to help algorithms/density algorithms/mce_irl algorithms/preference_comparisons + algorithms/sqil .. toctree:: :maxdepth: 2 @@ -76,6 +77,7 @@ If you use ``imitation`` in your research project, please cite our paper to help tutorials/7_train_density tutorials/8_train_custom_env tutorials/9_compare_baselines + tutorials/10_train_sqil tutorials/trajectories .. toctree::