Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tutorial on comparing algorithm performance #747

Merged
merged 7 commits into from
Jul 5, 2023
Merged

Conversation

AdamGleave
Copy link
Member

See #727

Resubmit of #739 to workaround branch permission issues. Credit to @RedTachyon for this PR

@codecov
Copy link

codecov bot commented Jul 5, 2023

Codecov Report

Merging #747 (6677525) into master (90b6aa3) will decrease coverage by 0.02%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master     #747      +/-   ##
==========================================
- Coverage   96.42%   96.41%   -0.02%     
==========================================
  Files          93       93              
  Lines        8782     8782              
==========================================
- Hits         8468     8467       -1     
- Misses        314      315       +1     

see 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@AdamGleave AdamGleave merged commit 688e163 into master Jul 5, 2023
@AdamGleave AdamGleave deleted the baselines-tutorial branch July 5, 2023 16:26
RedTachyon added a commit that referenced this pull request Jul 6, 2023
* Pin SB3 version to 1.7.0 (#738)

* Update conftest.py (#742)

* Custom environment tutorial (#746)

* Custom environment tutorial draft

* Update the docs website

* Clean notebook

* Text clarification and new environment

* Decrease training duration to hopefully make CI happy

* Clarify that BC itself does not learn rewards

---------

Co-authored-by: Ariel Kwiatkowski <ariel.j.kwiatkowski@gmail.com>

* Tutorial on comparing algorithm performance (#747)

* Add a new tutorial

* Update index.rst

* Improvements to the tutorial

* Some more caution words

* Fix typos

---------

Co-authored-by: Ariel Kwiatkowski <ariel.j.kwiatkowski@gmail.com>

---------

Co-authored-by: Adam Gleave <adam@gleave.me>
ernestum added a commit that referenced this pull request Aug 10, 2023
* Initial version of the SQIL implementation

* Pin SB3 version to 1.7.0 (#738) (#745)

---------

Co-authored-by: Ariel Kwiatkowski <ariel.j.kwiatkowski@gmail.com>

* Tutorial on comparing algorithm performance (#747)

* Add a new tutorial

* Update index.rst

* Improvements to the tutorial

* Some more caution words

* Fix typos

---------

Co-authored-by: Ariel Kwiatkowski <ariel.j.kwiatkowski@gmail.com>

---------

Co-authored-by: Adam Gleave <adam@gleave.me>

* Some documentation updates (not complete)

* Add a SQIL tutorial

* Reduce tutorial runtime

* Add SQIL description in docs, try to add it to the right places

* Fix docs

* Blacken a tutorial

* Reorder things in docs

* Change the SQIL structure to instead subclass the replay buffer, new test

* Add an empty line

* Simplify the arguments

* Cover another edge case, another test, fixes

* Fix a circular import issue

* Add a performance test - might be slow?

* Fix coverage

* Improve input validation

* Bugfix: have set_demonstrations set rather than return

* Move TransitionMapping from algorithms.base to data.types

* Fix typo: expert_buffer->self.expert_buffer

* Bugfix: use safe_to_numpy rather than assuming th.Tensor

* Fix lint

* Fix unused imports

* Refactor tests

* Bump # of rollouts to try to fix MacOS flakiness

* Simplify SQIL example and tutorial by 1. downloading expert trajectories instead of training an expert and sampling from the expert and 2. passing trajectories instead of transitions to SQIL.

* Improve docstring of SQILReplayBuffer.

* Set the expert_buffer in the constructor.

* Consistently set expert transition reward to 1 and learner transition reward to 0 when adding them to the SQILReplayBuffer instead of modifying them on-the-fly when sampling.

* Fix docstring of SQILReplayBuffer.sample()

* Switch back to the CartPole-v1 environment in the SQIL examples

* Only train for 1k steps in the SQIL example so the doctests don't run for too long.

* Fix cell metadata for tutorial notebook.

* Notebook formatting fixes.

* Fix typing error in SQIL implementation.

* Fix isort issue.

* Clarify that our variant of the SQIL implementation is not really "soft".

* Fix link in experts documentation.

* Remove support for transition mappings.

* Remove data_loader from SQIL test cases.

* Bump number of demonstrations in SQIL performance test to reduce flakiness.

* Adapt hyperparameters in test_sqil_performance to reduce flakiness

* Fix seeds for flaky test_sqil_performance

* Increase coverage in test_sqil.py

* Pass kwargs to SQIL.train to DQN.learn

- also set default tb_log_name to "SQIL"

* Pass parameters as kwargs for multi-ary methods in sqil.py

* Make test for exceptions raised by SQIL constructor more specific

- also: adjust imports to conform with style guide

---------

Co-authored-by: Adam Gleave <adam@gleave.me>
Co-authored-by: Maximilian Ernestus <maximilian@ernestus.de>
Co-authored-by: Jason Hoelscher-Obermaier <jason.hoelscherobermaier@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants