Fix critic network for Discrete CRR #485

nuance1979 · 2021-11-26T19:36:42Z

I have marked all applicable categories:
- exception-raising fix
- algorithm implementation fix
- documentation modification
- new feature
I have reformatted the code using make format (required)
I have checked the code using make commit-checks (required)
If applicable, I have mentioned the relevant/related issue(s)
If applicable, I have listed every items in this Pull Request below

This PR:

Fixes an inconsistency in the implementation of Discrete CRR. Now it uses Critic class for its critic, following conventions in other actor-critic policies;
Updates several offline policies to use ActorCritic class for its optimizer to eliminate randomness caused by parameter sharing between actor and critic;
Adds a disable_test option to offline_trainer to turn off testing during training;
Updates the Atari offline results in README.md;
Moves Atari offline RL examples to examples/offline; tests to test/offline per review comments.

…r_critic

codecov-commenter · 2021-11-26T19:57:26Z

Codecov Report

Merging #485 (0b61ba0) into master (5c5a3db) will decrease coverage by 0.20%.
The diff coverage is 91.25%.

@@            Coverage Diff             @@
##           master     #485      +/-   ##
==========================================
- Coverage   94.24%   94.03%   -0.21%     
==========================================
  Files          61       61              
  Lines        4031     4058      +27     
==========================================
+ Hits         3799     3816      +17     
- Misses        232      242      +10

Flag	Coverage Δ
unittests	`94.03% <91.25%> (-0.21%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
tianshou/trainer/utils.py	`97.05% <87.50%> (-2.95%)`	⬇️
tianshou/trainer/offline.py	`96.22% <90.90%> (-3.78%)`	⬇️
tianshou/trainer/offpolicy.py	`97.50% <91.30%> (-2.50%)`	⬇️
tianshou/trainer/onpolicy.py	`93.82% <91.30%> (-2.12%)`	⬇️
tianshou/__init__.py	`100.00% <100.00%> (ø)`
tianshou/policy/imitation/discrete_crr.py	`93.44% <100.00%> (ø)`
tianshou/utils/logger/tensorboard.py	`95.23% <100.00%> (+0.11%)`	⬆️
tianshou/policy/modelfree/trpo.py	`88.52% <0.00%> (-4.92%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5c5a3db...0b61ba0. Read the comment docs.

Trinkle23897

Also, could you please move the related result from examples/atari to examples/offline and move the related test to test/discrete to test/offline?

tianshou/trainer/offline.py

…r_critic

nuance1979 · 2021-11-27T16:36:17Z

Also, could you please move the related result from examples/atari to examples/offline and move the related test to test/discrete to test/offline?

Sure. I added a few __init__.py files in examples directory so that I can import Atari helpers in examples/atari from examples/offline.

Trinkle23897 · 2021-11-28T02:10:05Z

how about the other two trainers?

nuance1979 · 2021-11-28T02:14:41Z

how about the other two trainers?

This functionality is mainly for offline RL when we don't even have an env and want to train a model; other than that, I think it's rarely necessary and/or useful to turn off testing during training.

Of course I can add it if you insist.

Trinkle23897 · 2021-11-28T02:17:26Z

Well, actually if the environment simulation is expensive and if we have already know the approximate number of steps to use, excluding the test phase can be more efficient for training an agent.

nuance1979 · 2021-11-28T02:21:04Z

Well, actually if the environment simulation is expensive and if we have already know the approximate number of steps to use, excluding the test phase can be more efficient for training an agent.

Ok. I'll add it then.

- Fixes an inconsistency in the implementation of Discrete CRR. Now it uses `Critic` class for its critic, following conventions in other actor-critic policies; - Updates several offline policies to use `ActorCritic` class for its optimizer to eliminate randomness caused by parameter sharing between actor and critic; - Add `writer.flush()` in TensorboardLogger to ensure real-time result; - Enable `test_collector=None` in 3 trainers to turn off testing during training; - Updates the Atari offline results in README.md; - Moves Atari offline RL examples to `examples/offline`; tests to `test/offline` per review comments.

Yi Su and others added 2 commits November 27, 2021 03:03

update atari_crr network to be consistent with other algorithms

97fac1b

Merge branch 'fix_crr_critic' of /home/yi.su/git/tianshou into fix_cr…

371ba69

…r_critic

Trinkle23897 reviewed Nov 26, 2021

View reviewed changes

tianshou/trainer/offline.py Outdated Show resolved Hide resolved

Yi Su and others added 8 commits November 27, 2021 05:04

update atari_crr network to be consistent with other algorithms

01d366d

move atari offline examples to offline directory

1e8ad47

Merge branch 'fix_crr_critic' of /home/yi.su/git/tianshou into fix_cr…

e707991

…r_critic

fix code format

ce33933

check in missing file

c185959

make linter happy

797febf

remove unused code

6522c1a

make linter happy

3df5900

Trinkle23897 and others added 2 commits November 27, 2021 14:39

format

8e07997

use test_collector is None to turn off testing

28665c1

Yi Su and others added 6 commits November 28, 2021 10:32

make test_collector optional in all trainers

71825e9

make linter and formatter happy

00f1b8e

make mypy happy

e930a96

code format

a8afc7d

make mypy happy

a9a33f4

make mypy happy

5fb2d43

Trinkle23897 mentioned this pull request Nov 28, 2021

Visualize result in real time #482

Closed

fix exception when test_collector=None

8d33a28

Trinkle23897 linked an issue Nov 28, 2021 that may be closed by this pull request

Visualize result in real time #482

Closed

bump to 0.4.5

0b61ba0

Trinkle23897 approved these changes Nov 28, 2021

View reviewed changes

Trinkle23897 merged commit 3592f45 into thu-ml:master Nov 28, 2021

nuance1979 deleted the fix_crr_critic branch January 23, 2022 18:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix critic network for Discrete CRR #485

Fix critic network for Discrete CRR #485

nuance1979 commented Nov 26, 2021 •

edited

Loading

codecov-commenter commented Nov 26, 2021 •

edited

Loading

Trinkle23897 left a comment

nuance1979 commented Nov 27, 2021

Trinkle23897 commented Nov 28, 2021

nuance1979 commented Nov 28, 2021 •

edited

Loading

Trinkle23897 commented Nov 28, 2021

nuance1979 commented Nov 28, 2021

Fix critic network for Discrete CRR #485

Fix critic network for Discrete CRR #485

Conversation

nuance1979 commented Nov 26, 2021 • edited Loading

codecov-commenter commented Nov 26, 2021 • edited Loading

Codecov Report

Trinkle23897 left a comment

Choose a reason for hiding this comment

nuance1979 commented Nov 27, 2021

Trinkle23897 commented Nov 28, 2021

nuance1979 commented Nov 28, 2021 • edited Loading

Trinkle23897 commented Nov 28, 2021

nuance1979 commented Nov 28, 2021

nuance1979 commented Nov 26, 2021 •

edited

Loading

codecov-commenter commented Nov 26, 2021 •

edited

Loading

nuance1979 commented Nov 28, 2021 •

edited

Loading