From a41ccced1bc5480658d6a97448859b388a3997ce Mon Sep 17 00:00:00 2001
From: nicolengsy <shinyinn@usc.edu>
Date: Wed, 14 Oct 2020 00:17:56 -0700
Subject: [PATCH 1/6] Missing format

---
 docs/index.md            |  3 +-
 docs/user/algo_teppo.md  | 78 ++++++++++++++++++++++++++++++++++++++++
 docs/user/references.bib |  8 +++++
 3 files changed, 88 insertions(+), 1 deletion(-)
 create mode 100644 docs/user/algo_teppo.md

diff --git a/docs/index.md b/docs/index.md
index fe2b70496b..adef8949c8 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -57,7 +57,8 @@ and how to implement new MDPs and new algorithms.
    user/algo_mtppo
    user/algo_vpg
    user/algo_td3
-
+   user/algo_teppo
+   
 .. toctree::
    :maxdepth: 2
    :caption: Reference Guide
diff --git a/docs/user/algo_teppo.md b/docs/user/algo_teppo.md
new file mode 100644
index 0000000000..fa85d5e805
--- /dev/null
+++ b/docs/user/algo_teppo.md
@@ -0,0 +1,78 @@
+# Proximal Policy Optimization with Task Embedding (TEPPO)
+
+
+```eval_rst
+.. list-table::
+   :header-rows: 0
+   :stub-columns: 1
+   :widths: auto
+
+   * - **Paper**
+     - Learning Skill Embeddings for Transferable Robot Skills :cite:`hausman2018learning`
+   * - **Framework(s)**
+     - .. figure:: ./images/tf.png
+        :scale: 20%
+        :class: no-scaled-link
+
+        Tensorflow
+   * - **API Reference**
+     - `garage.tf.algos.TEPPO <../_autoapi/garage/torch/algos/index.html#garage.tf.algos.TEPPO>`_
+   * - **Code**
+     - `garage/tf/algos/td3.py <https://github.com/rlworkgroup/garage/blob/master/src/garage/tf/algos/te_ppo.py>`_
+   * - **Examples**
+     - :ref:`te_ppo_metaworld_mt1_push`, :ref:`te_ppo_metaworld_mt10`, :ref:`te_ppo_metaworld_mt50`, :ref:`te_ppo_point`
+```
+
+
+Proximal Policy Optimization Algorithms (PPO) is a family of policy gradient methods which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. TEPPO parameterizes PPO via a shared skill embedding space.
+
+## Default Parameters
+
+```py
+discount=0.99,
+gae_lambda=0.98,
+lr_clip_range=0.01,
+max_kl_step=0.01,
+policy_ent_coeff=1e-3,
+encoder_ent_coeff=1e-3,
+inference_ce_coeff=1e-3
+```
+
+## Examples
+
+### te_ppo_metaworld_mt1_push
+
+```eval_rst
+.. literalinlcude:: ../../examples/tf/te_ppo_metaworld_mt1_push.py
+```
+
+### te_ppo_metaworld_mt10
+
+```eval_rst
+.. literalinlcude:: ../../examples/tf/te_ppo_metaworld_mt10.py
+```
+
+### te_ppo_metaworld_mt50
+
+```eval_rst
+.. literalinlcude:: ../../examples/tf/te_ppo_metaworld_mt50.py
+```
+
+### te_ppo_point
+
+```eval_rst
+.. literalinlcude:: ../../examples/tf/te_ppo_point.py
+```
+
+## References
+
+```eval_rst
+.. bibliography:: references.bib
+   :style: unsrt
+   :filter: docname in docnames
+```
+
+----
+
+*This page was authored by Nicole Shin Ying Ng ([@nicolengsy](https://github.com/nicolengsy)).*
+
diff --git a/docs/user/references.bib b/docs/user/references.bib
index d6b7936098..d9f4707b03 100644
--- a/docs/user/references.bib
+++ b/docs/user/references.bib
@@ -82,3 +82,11 @@ @article{yu2019metaworld
     year={2019},
     journal={arXiv:1910.10897},
 }
+
+@article{hausman2018learning,
+  title={Learning an Embedding Space for Transferable Robot Skills},
+  author={Karol Hausman and Jost Tobias Springenberg and Ziyu Wang and Nicolas Heess and Martin Riedmiller},
+  booktitle={International Conference on Learning Representations},
+  year={2018},
+  url={https://openreview.net/forum?id=rk07ZXZRb},
+}
\ No newline at end of file

From 07049d0049a8f29e29f4b4a33b86c371c314fb0a Mon Sep 17 00:00:00 2001
From: Nicole Ng <shinyinn@usc.edu>
Date: Thu, 29 Oct 2020 01:27:21 -0700
Subject: [PATCH 2/6] Complete teppo docs

---
 docs/user/algo_teppo.md  | 12 ++++++------
 docs/user/references.bib |  1 +
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/docs/user/algo_teppo.md b/docs/user/algo_teppo.md
index fa85d5e805..3eee276323 100644
--- a/docs/user/algo_teppo.md
+++ b/docs/user/algo_teppo.md
@@ -18,13 +18,13 @@
    * - **API Reference**
      - `garage.tf.algos.TEPPO <../_autoapi/garage/torch/algos/index.html#garage.tf.algos.TEPPO>`_
    * - **Code**
-     - `garage/tf/algos/td3.py <https://github.com/rlworkgroup/garage/blob/master/src/garage/tf/algos/te_ppo.py>`_
+     - `garage/tf/algos/te_ppo.py <https://github.com/rlworkgroup/garage/blob/master/src/garage/tf/algos/te_ppo.py>`_
    * - **Examples**
      - :ref:`te_ppo_metaworld_mt1_push`, :ref:`te_ppo_metaworld_mt10`, :ref:`te_ppo_metaworld_mt50`, :ref:`te_ppo_point`
 ```
 
 
-Proximal Policy Optimization Algorithms (PPO) is a family of policy gradient methods which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. TEPPO parameterizes PPO via a shared skill embedding space.
+Proximal Policy Optimization Algorithms (PPO) is a family of policy gradient methods which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. TEPPO parameterizes the PPO policy via a shared skill embedding space.
 
 ## Default Parameters
 
@@ -43,25 +43,25 @@ inference_ce_coeff=1e-3
 ### te_ppo_metaworld_mt1_push
 
 ```eval_rst
-.. literalinlcude:: ../../examples/tf/te_ppo_metaworld_mt1_push.py
+.. literalinclude:: ../../examples/tf/te_ppo_metaworld_mt1_push.py
 ```
 
 ### te_ppo_metaworld_mt10
 
 ```eval_rst
-.. literalinlcude:: ../../examples/tf/te_ppo_metaworld_mt10.py
+.. literalinclude:: ../../examples/tf/te_ppo_metaworld_mt10.py
 ```
 
 ### te_ppo_metaworld_mt50
 
 ```eval_rst
-.. literalinlcude:: ../../examples/tf/te_ppo_metaworld_mt50.py
+.. literalinclude:: ../../examples/tf/te_ppo_metaworld_mt50.py
 ```
 
 ### te_ppo_point
 
 ```eval_rst
-.. literalinlcude:: ../../examples/tf/te_ppo_point.py
+.. literalinclude:: ../../examples/tf/te_ppo_point.py
 ```
 
 ## References
diff --git a/docs/user/references.bib b/docs/user/references.bib
index d9f4707b03..aaacdba203 100644
--- a/docs/user/references.bib
+++ b/docs/user/references.bib
@@ -88,5 +88,6 @@ @article{hausman2018learning
   author={Karol Hausman and Jost Tobias Springenberg and Ziyu Wang and Nicolas Heess and Martin Riedmiller},
   booktitle={International Conference on Learning Representations},
   year={2018},
+  journal={},
   url={https://openreview.net/forum?id=rk07ZXZRb},
 }
\ No newline at end of file

From 65062b2420cb5991baa8736c4b5d94d6917b496f Mon Sep 17 00:00:00 2001
From: Nicole Ng <shinyinn@usc.edu>
Date: Thu, 29 Oct 2020 01:27:21 -0700
Subject: [PATCH 3/6] Complete teppo docs

---
 docs/user/algo_teppo.md  | 12 ++++++------
 docs/user/references.bib |  2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/docs/user/algo_teppo.md b/docs/user/algo_teppo.md
index fa85d5e805..3eee276323 100644
--- a/docs/user/algo_teppo.md
+++ b/docs/user/algo_teppo.md
@@ -18,13 +18,13 @@
    * - **API Reference**
      - `garage.tf.algos.TEPPO <../_autoapi/garage/torch/algos/index.html#garage.tf.algos.TEPPO>`_
    * - **Code**
-     - `garage/tf/algos/td3.py <https://github.com/rlworkgroup/garage/blob/master/src/garage/tf/algos/te_ppo.py>`_
+     - `garage/tf/algos/te_ppo.py <https://github.com/rlworkgroup/garage/blob/master/src/garage/tf/algos/te_ppo.py>`_
    * - **Examples**
      - :ref:`te_ppo_metaworld_mt1_push`, :ref:`te_ppo_metaworld_mt10`, :ref:`te_ppo_metaworld_mt50`, :ref:`te_ppo_point`
 ```
 
 
-Proximal Policy Optimization Algorithms (PPO) is a family of policy gradient methods which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. TEPPO parameterizes PPO via a shared skill embedding space.
+Proximal Policy Optimization Algorithms (PPO) is a family of policy gradient methods which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. TEPPO parameterizes the PPO policy via a shared skill embedding space.
 
 ## Default Parameters
 
@@ -43,25 +43,25 @@ inference_ce_coeff=1e-3
 ### te_ppo_metaworld_mt1_push
 
 ```eval_rst
-.. literalinlcude:: ../../examples/tf/te_ppo_metaworld_mt1_push.py
+.. literalinclude:: ../../examples/tf/te_ppo_metaworld_mt1_push.py
 ```
 
 ### te_ppo_metaworld_mt10
 
 ```eval_rst
-.. literalinlcude:: ../../examples/tf/te_ppo_metaworld_mt10.py
+.. literalinclude:: ../../examples/tf/te_ppo_metaworld_mt10.py
 ```
 
 ### te_ppo_metaworld_mt50
 
 ```eval_rst
-.. literalinlcude:: ../../examples/tf/te_ppo_metaworld_mt50.py
+.. literalinclude:: ../../examples/tf/te_ppo_metaworld_mt50.py
 ```
 
 ### te_ppo_point
 
 ```eval_rst
-.. literalinlcude:: ../../examples/tf/te_ppo_point.py
+.. literalinclude:: ../../examples/tf/te_ppo_point.py
 ```
 
 ## References
diff --git a/docs/user/references.bib b/docs/user/references.bib
index d9f4707b03..8601a82a77 100644
--- a/docs/user/references.bib
+++ b/docs/user/references.bib
@@ -83,7 +83,7 @@ @article{yu2019metaworld
     journal={arXiv:1910.10897},
 }
 
-@article{hausman2018learning,
+@inproceedings{hausman2018learning,
   title={Learning an Embedding Space for Transferable Robot Skills},
   author={Karol Hausman and Jost Tobias Springenberg and Ziyu Wang and Nicolas Heess and Martin Riedmiller},
   booktitle={International Conference on Learning Representations},

From 2e9b056f948f2eea604ede2b52bbf869ba082357 Mon Sep 17 00:00:00 2001
From: Nicole Ng <shinyinn@usc.edu>
Date: Fri, 30 Oct 2020 01:56:56 -0700
Subject: [PATCH 4/6] Fix pre-commit

---
 docs/index.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/index.md b/docs/index.md
index 0bc4801c49..23e28dca6e 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -61,10 +61,10 @@ and how to implement new MDPs and new algorithms.
    user/algo_mtppo
    user/algo_vpg
    user/algo_td3
-   user/algo_teppo
+   TEPPO <user/algo_teppo>
    user/algo_ddpg
    user/algo_cem
-   
+
 .. toctree::
    :maxdepth: 2
    :caption: Reference Guide

From 4ee2e66ecf3cf6cad721ae1deb7a00c4e6eeafe6 Mon Sep 17 00:00:00 2001
From: Nicole Ng <shinyinn@usc.edu>
Date: Fri, 30 Oct 2020 12:21:38 -0700
Subject: [PATCH 5/6] Fix pre-commit

---
 docs/user/algo_teppo.md | 2 --
 1 file changed, 2 deletions(-)

diff --git a/docs/user/algo_teppo.md b/docs/user/algo_teppo.md
index 3eee276323..b0eaf2b539 100644
--- a/docs/user/algo_teppo.md
+++ b/docs/user/algo_teppo.md
@@ -23,7 +23,6 @@
      - :ref:`te_ppo_metaworld_mt1_push`, :ref:`te_ppo_metaworld_mt10`, :ref:`te_ppo_metaworld_mt50`, :ref:`te_ppo_point`
 ```
 
-
 Proximal Policy Optimization Algorithms (PPO) is a family of policy gradient methods which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. TEPPO parameterizes the PPO policy via a shared skill embedding space.
 
 ## Default Parameters
@@ -75,4 +74,3 @@ inference_ce_coeff=1e-3
 ----
 
 *This page was authored by Nicole Shin Ying Ng ([@nicolengsy](https://github.com/nicolengsy)).*
-

From 51543c2a06e550b630f687231e963552ecf8b17d Mon Sep 17 00:00:00 2001
From: Nicole Ng <shinyinn@usc.edu>
Date: Sun, 22 Nov 2020 16:38:10 -0800
Subject: [PATCH 6/6] Fix typo

---
 docs/user/algo_teppo.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/user/algo_teppo.md b/docs/user/algo_teppo.md
index b0eaf2b539..605f8b5551 100644
--- a/docs/user/algo_teppo.md
+++ b/docs/user/algo_teppo.md
@@ -16,7 +16,7 @@
 
         Tensorflow
    * - **API Reference**
-     - `garage.tf.algos.TEPPO <../_autoapi/garage/torch/algos/index.html#garage.tf.algos.TEPPO>`_
+     - `garage.tf.algos.TEPPO <../_autoapi/garage/tf/algos/index.html#garage.tf.algos.TEPPO>`_
    * - **Code**
      - `garage/tf/algos/te_ppo.py <https://github.com/rlworkgroup/garage/blob/master/src/garage/tf/algos/te_ppo.py>`_
    * - **Examples**