From a41ccced1bc5480658d6a97448859b388a3997ce Mon Sep 17 00:00:00 2001 From: nicolengsy Date: Wed, 14 Oct 2020 00:17:56 -0700 Subject: [PATCH 1/6] Missing format --- docs/index.md | 3 +- docs/user/algo_teppo.md | 78 ++++++++++++++++++++++++++++++++++++++++ docs/user/references.bib | 8 +++++ 3 files changed, 88 insertions(+), 1 deletion(-) create mode 100644 docs/user/algo_teppo.md diff --git a/docs/index.md b/docs/index.md index fe2b70496b..adef8949c8 100644 --- a/docs/index.md +++ b/docs/index.md @@ -57,7 +57,8 @@ and how to implement new MDPs and new algorithms. user/algo_mtppo user/algo_vpg user/algo_td3 - + user/algo_teppo + .. toctree:: :maxdepth: 2 :caption: Reference Guide diff --git a/docs/user/algo_teppo.md b/docs/user/algo_teppo.md new file mode 100644 index 0000000000..fa85d5e805 --- /dev/null +++ b/docs/user/algo_teppo.md @@ -0,0 +1,78 @@ +# Proximal Policy Optimization with Task Embedding (TEPPO) + + +```eval_rst +.. list-table:: + :header-rows: 0 + :stub-columns: 1 + :widths: auto + + * - **Paper** + - Learning Skill Embeddings for Transferable Robot Skills :cite:`hausman2018learning` + * - **Framework(s)** + - .. figure:: ./images/tf.png + :scale: 20% + :class: no-scaled-link + + Tensorflow + * - **API Reference** + - `garage.tf.algos.TEPPO <../_autoapi/garage/torch/algos/index.html#garage.tf.algos.TEPPO>`_ + * - **Code** + - `garage/tf/algos/td3.py `_ + * - **Examples** + - :ref:`te_ppo_metaworld_mt1_push`, :ref:`te_ppo_metaworld_mt10`, :ref:`te_ppo_metaworld_mt50`, :ref:`te_ppo_point` +``` + + +Proximal Policy Optimization Algorithms (PPO) is a family of policy gradient methods which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. TEPPO parameterizes PPO via a shared skill embedding space. + +## Default Parameters + +```py +discount=0.99, +gae_lambda=0.98, +lr_clip_range=0.01, +max_kl_step=0.01, +policy_ent_coeff=1e-3, +encoder_ent_coeff=1e-3, +inference_ce_coeff=1e-3 +``` + +## Examples + +### te_ppo_metaworld_mt1_push + +```eval_rst +.. literalinlcude:: ../../examples/tf/te_ppo_metaworld_mt1_push.py +``` + +### te_ppo_metaworld_mt10 + +```eval_rst +.. literalinlcude:: ../../examples/tf/te_ppo_metaworld_mt10.py +``` + +### te_ppo_metaworld_mt50 + +```eval_rst +.. literalinlcude:: ../../examples/tf/te_ppo_metaworld_mt50.py +``` + +### te_ppo_point + +```eval_rst +.. literalinlcude:: ../../examples/tf/te_ppo_point.py +``` + +## References + +```eval_rst +.. bibliography:: references.bib + :style: unsrt + :filter: docname in docnames +``` + +---- + +*This page was authored by Nicole Shin Ying Ng ([@nicolengsy](https://github.com/nicolengsy)).* + diff --git a/docs/user/references.bib b/docs/user/references.bib index d6b7936098..d9f4707b03 100644 --- a/docs/user/references.bib +++ b/docs/user/references.bib @@ -82,3 +82,11 @@ @article{yu2019metaworld year={2019}, journal={arXiv:1910.10897}, } + +@article{hausman2018learning, + title={Learning an Embedding Space for Transferable Robot Skills}, + author={Karol Hausman and Jost Tobias Springenberg and Ziyu Wang and Nicolas Heess and Martin Riedmiller}, + booktitle={International Conference on Learning Representations}, + year={2018}, + url={https://openreview.net/forum?id=rk07ZXZRb}, +} \ No newline at end of file From 07049d0049a8f29e29f4b4a33b86c371c314fb0a Mon Sep 17 00:00:00 2001 From: Nicole Ng Date: Thu, 29 Oct 2020 01:27:21 -0700 Subject: [PATCH 2/6] Complete teppo docs --- docs/user/algo_teppo.md | 12 ++++++------ docs/user/references.bib | 1 + 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/docs/user/algo_teppo.md b/docs/user/algo_teppo.md index fa85d5e805..3eee276323 100644 --- a/docs/user/algo_teppo.md +++ b/docs/user/algo_teppo.md @@ -18,13 +18,13 @@ * - **API Reference** - `garage.tf.algos.TEPPO <../_autoapi/garage/torch/algos/index.html#garage.tf.algos.TEPPO>`_ * - **Code** - - `garage/tf/algos/td3.py `_ + - `garage/tf/algos/te_ppo.py `_ * - **Examples** - :ref:`te_ppo_metaworld_mt1_push`, :ref:`te_ppo_metaworld_mt10`, :ref:`te_ppo_metaworld_mt50`, :ref:`te_ppo_point` ``` -Proximal Policy Optimization Algorithms (PPO) is a family of policy gradient methods which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. TEPPO parameterizes PPO via a shared skill embedding space. +Proximal Policy Optimization Algorithms (PPO) is a family of policy gradient methods which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. TEPPO parameterizes the PPO policy via a shared skill embedding space. ## Default Parameters @@ -43,25 +43,25 @@ inference_ce_coeff=1e-3 ### te_ppo_metaworld_mt1_push ```eval_rst -.. literalinlcude:: ../../examples/tf/te_ppo_metaworld_mt1_push.py +.. literalinclude:: ../../examples/tf/te_ppo_metaworld_mt1_push.py ``` ### te_ppo_metaworld_mt10 ```eval_rst -.. literalinlcude:: ../../examples/tf/te_ppo_metaworld_mt10.py +.. literalinclude:: ../../examples/tf/te_ppo_metaworld_mt10.py ``` ### te_ppo_metaworld_mt50 ```eval_rst -.. literalinlcude:: ../../examples/tf/te_ppo_metaworld_mt50.py +.. literalinclude:: ../../examples/tf/te_ppo_metaworld_mt50.py ``` ### te_ppo_point ```eval_rst -.. literalinlcude:: ../../examples/tf/te_ppo_point.py +.. literalinclude:: ../../examples/tf/te_ppo_point.py ``` ## References diff --git a/docs/user/references.bib b/docs/user/references.bib index d9f4707b03..aaacdba203 100644 --- a/docs/user/references.bib +++ b/docs/user/references.bib @@ -88,5 +88,6 @@ @article{hausman2018learning author={Karol Hausman and Jost Tobias Springenberg and Ziyu Wang and Nicolas Heess and Martin Riedmiller}, booktitle={International Conference on Learning Representations}, year={2018}, + journal={}, url={https://openreview.net/forum?id=rk07ZXZRb}, } \ No newline at end of file From 65062b2420cb5991baa8736c4b5d94d6917b496f Mon Sep 17 00:00:00 2001 From: Nicole Ng Date: Thu, 29 Oct 2020 01:27:21 -0700 Subject: [PATCH 3/6] Complete teppo docs --- docs/user/algo_teppo.md | 12 ++++++------ docs/user/references.bib | 2 +- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/user/algo_teppo.md b/docs/user/algo_teppo.md index fa85d5e805..3eee276323 100644 --- a/docs/user/algo_teppo.md +++ b/docs/user/algo_teppo.md @@ -18,13 +18,13 @@ * - **API Reference** - `garage.tf.algos.TEPPO <../_autoapi/garage/torch/algos/index.html#garage.tf.algos.TEPPO>`_ * - **Code** - - `garage/tf/algos/td3.py `_ + - `garage/tf/algos/te_ppo.py `_ * - **Examples** - :ref:`te_ppo_metaworld_mt1_push`, :ref:`te_ppo_metaworld_mt10`, :ref:`te_ppo_metaworld_mt50`, :ref:`te_ppo_point` ``` -Proximal Policy Optimization Algorithms (PPO) is a family of policy gradient methods which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. TEPPO parameterizes PPO via a shared skill embedding space. +Proximal Policy Optimization Algorithms (PPO) is a family of policy gradient methods which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. TEPPO parameterizes the PPO policy via a shared skill embedding space. ## Default Parameters @@ -43,25 +43,25 @@ inference_ce_coeff=1e-3 ### te_ppo_metaworld_mt1_push ```eval_rst -.. literalinlcude:: ../../examples/tf/te_ppo_metaworld_mt1_push.py +.. literalinclude:: ../../examples/tf/te_ppo_metaworld_mt1_push.py ``` ### te_ppo_metaworld_mt10 ```eval_rst -.. literalinlcude:: ../../examples/tf/te_ppo_metaworld_mt10.py +.. literalinclude:: ../../examples/tf/te_ppo_metaworld_mt10.py ``` ### te_ppo_metaworld_mt50 ```eval_rst -.. literalinlcude:: ../../examples/tf/te_ppo_metaworld_mt50.py +.. literalinclude:: ../../examples/tf/te_ppo_metaworld_mt50.py ``` ### te_ppo_point ```eval_rst -.. literalinlcude:: ../../examples/tf/te_ppo_point.py +.. literalinclude:: ../../examples/tf/te_ppo_point.py ``` ## References diff --git a/docs/user/references.bib b/docs/user/references.bib index d9f4707b03..8601a82a77 100644 --- a/docs/user/references.bib +++ b/docs/user/references.bib @@ -83,7 +83,7 @@ @article{yu2019metaworld journal={arXiv:1910.10897}, } -@article{hausman2018learning, +@inproceedings{hausman2018learning, title={Learning an Embedding Space for Transferable Robot Skills}, author={Karol Hausman and Jost Tobias Springenberg and Ziyu Wang and Nicolas Heess and Martin Riedmiller}, booktitle={International Conference on Learning Representations}, From 2e9b056f948f2eea604ede2b52bbf869ba082357 Mon Sep 17 00:00:00 2001 From: Nicole Ng Date: Fri, 30 Oct 2020 01:56:56 -0700 Subject: [PATCH 4/6] Fix pre-commit --- docs/index.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/index.md b/docs/index.md index 0bc4801c49..23e28dca6e 100644 --- a/docs/index.md +++ b/docs/index.md @@ -61,10 +61,10 @@ and how to implement new MDPs and new algorithms. user/algo_mtppo user/algo_vpg user/algo_td3 - user/algo_teppo + TEPPO user/algo_ddpg user/algo_cem - + .. toctree:: :maxdepth: 2 :caption: Reference Guide From 4ee2e66ecf3cf6cad721ae1deb7a00c4e6eeafe6 Mon Sep 17 00:00:00 2001 From: Nicole Ng Date: Fri, 30 Oct 2020 12:21:38 -0700 Subject: [PATCH 5/6] Fix pre-commit --- docs/user/algo_teppo.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/docs/user/algo_teppo.md b/docs/user/algo_teppo.md index 3eee276323..b0eaf2b539 100644 --- a/docs/user/algo_teppo.md +++ b/docs/user/algo_teppo.md @@ -23,7 +23,6 @@ - :ref:`te_ppo_metaworld_mt1_push`, :ref:`te_ppo_metaworld_mt10`, :ref:`te_ppo_metaworld_mt50`, :ref:`te_ppo_point` ``` - Proximal Policy Optimization Algorithms (PPO) is a family of policy gradient methods which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. TEPPO parameterizes the PPO policy via a shared skill embedding space. ## Default Parameters @@ -75,4 +74,3 @@ inference_ce_coeff=1e-3 ---- *This page was authored by Nicole Shin Ying Ng ([@nicolengsy](https://github.com/nicolengsy)).* - From 51543c2a06e550b630f687231e963552ecf8b17d Mon Sep 17 00:00:00 2001 From: Nicole Ng Date: Sun, 22 Nov 2020 16:38:10 -0800 Subject: [PATCH 6/6] Fix typo --- docs/user/algo_teppo.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/user/algo_teppo.md b/docs/user/algo_teppo.md index b0eaf2b539..605f8b5551 100644 --- a/docs/user/algo_teppo.md +++ b/docs/user/algo_teppo.md @@ -16,7 +16,7 @@ Tensorflow * - **API Reference** - - `garage.tf.algos.TEPPO <../_autoapi/garage/torch/algos/index.html#garage.tf.algos.TEPPO>`_ + - `garage.tf.algos.TEPPO <../_autoapi/garage/tf/algos/index.html#garage.tf.algos.TEPPO>`_ * - **Code** - `garage/tf/algos/te_ppo.py `_ * - **Examples**