Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix critic network for Discrete CRR #485

Merged
merged 20 commits into from
Nov 28, 2021

Conversation

nuance1979
Copy link
Collaborator

@nuance1979 nuance1979 commented Nov 26, 2021

  • I have marked all applicable categories:
    • exception-raising fix
    • algorithm implementation fix
    • documentation modification
    • new feature
  • I have reformatted the code using make format (required)
  • I have checked the code using make commit-checks (required)
  • If applicable, I have mentioned the relevant/related issue(s)
  • If applicable, I have listed every items in this Pull Request below

This PR:

  • Fixes an inconsistency in the implementation of Discrete CRR. Now it uses Critic class for its critic, following conventions in other actor-critic policies;
  • Updates several offline policies to use ActorCritic class for its optimizer to eliminate randomness caused by parameter sharing between actor and critic;
  • Adds a disable_test option to offline_trainer to turn off testing during training;
  • Updates the Atari offline results in README.md;
  • Moves Atari offline RL examples to examples/offline; tests to test/offline per review comments.

@codecov-commenter
Copy link

codecov-commenter commented Nov 26, 2021

Codecov Report

Merging #485 (0b61ba0) into master (5c5a3db) will decrease coverage by 0.20%.
The diff coverage is 91.25%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #485      +/-   ##
==========================================
- Coverage   94.24%   94.03%   -0.21%     
==========================================
  Files          61       61              
  Lines        4031     4058      +27     
==========================================
+ Hits         3799     3816      +17     
- Misses        232      242      +10     
Flag Coverage Δ
unittests 94.03% <91.25%> (-0.21%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
tianshou/trainer/utils.py 97.05% <87.50%> (-2.95%) ⬇️
tianshou/trainer/offline.py 96.22% <90.90%> (-3.78%) ⬇️
tianshou/trainer/offpolicy.py 97.50% <91.30%> (-2.50%) ⬇️
tianshou/trainer/onpolicy.py 93.82% <91.30%> (-2.12%) ⬇️
tianshou/__init__.py 100.00% <100.00%> (ø)
tianshou/policy/imitation/discrete_crr.py 93.44% <100.00%> (ø)
tianshou/utils/logger/tensorboard.py 95.23% <100.00%> (+0.11%) ⬆️
tianshou/policy/modelfree/trpo.py 88.52% <0.00%> (-4.92%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5c5a3db...0b61ba0. Read the comment docs.

Copy link
Collaborator

@Trinkle23897 Trinkle23897 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, could you please move the related result from examples/atari to examples/offline and move the related test to test/discrete to test/offline?

tianshou/trainer/offline.py Outdated Show resolved Hide resolved
@nuance1979
Copy link
Collaborator Author

Also, could you please move the related result from examples/atari to examples/offline and move the related test to test/discrete to test/offline?

Sure. I added a few __init__.py files in examples directory so that I can import Atari helpers in examples/atari from examples/offline.

@Trinkle23897
Copy link
Collaborator

how about the other two trainers?

@nuance1979
Copy link
Collaborator Author

nuance1979 commented Nov 28, 2021

how about the other two trainers?

This functionality is mainly for offline RL when we don't even have an env and want to train a model; other than that, I think it's rarely necessary and/or useful to turn off testing during training.

Of course I can add it if you insist.

@Trinkle23897
Copy link
Collaborator

Well, actually if the environment simulation is expensive and if we have already know the approximate number of steps to use, excluding the test phase can be more efficient for training an agent.

@nuance1979
Copy link
Collaborator Author

Well, actually if the environment simulation is expensive and if we have already know the approximate number of steps to use, excluding the test phase can be more efficient for training an agent.

Ok. I'll add it then.

@Trinkle23897 Trinkle23897 linked an issue Nov 28, 2021 that may be closed by this pull request
@Trinkle23897 Trinkle23897 merged commit 3592f45 into thu-ml:master Nov 28, 2021
@nuance1979 nuance1979 deleted the fix_crr_critic branch January 23, 2022 18:04
BFAnas pushed a commit to BFAnas/tianshou that referenced this pull request May 5, 2024
- Fixes an inconsistency in the implementation of Discrete CRR. Now it uses `Critic` class for its critic, following conventions in other actor-critic policies;
- Updates several offline policies to use `ActorCritic` class for its optimizer to eliminate randomness caused by parameter sharing between actor and critic;
- Add `writer.flush()` in TensorboardLogger to ensure real-time result;
- Enable `test_collector=None` in 3 trainers to turn off testing during training;
- Updates the Atari offline results in README.md;
- Moves Atari offline RL examples to `examples/offline`; tests to `test/offline` per review comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Visualize result in real time
3 participants