-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix critic network for Discrete CRR #485
Conversation
Codecov Report
@@ Coverage Diff @@
## master #485 +/- ##
==========================================
- Coverage 94.24% 94.03% -0.21%
==========================================
Files 61 61
Lines 4031 4058 +27
==========================================
+ Hits 3799 3816 +17
- Misses 232 242 +10
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, could you please move the related result from examples/atari
to examples/offline
and move the related test to test/discrete
to test/offline
?
Sure. I added a few |
how about the other two trainers? |
This functionality is mainly for offline RL when we don't even have an env and want to train a model; other than that, I think it's rarely necessary and/or useful to turn off testing during training. Of course I can add it if you insist. |
Well, actually if the environment simulation is expensive and if we have already know the approximate number of steps to use, excluding the test phase can be more efficient for training an agent. |
Ok. I'll add it then. |
- Fixes an inconsistency in the implementation of Discrete CRR. Now it uses `Critic` class for its critic, following conventions in other actor-critic policies; - Updates several offline policies to use `ActorCritic` class for its optimizer to eliminate randomness caused by parameter sharing between actor and critic; - Add `writer.flush()` in TensorboardLogger to ensure real-time result; - Enable `test_collector=None` in 3 trainers to turn off testing during training; - Updates the Atari offline results in README.md; - Moves Atari offline RL examples to `examples/offline`; tests to `test/offline` per review comments.
make format
(required)make commit-checks
(required)This PR:
Critic
class for its critic, following conventions in other actor-critic policies;ActorCritic
class for its optimizer to eliminate randomness caused by parameter sharing between actor and critic;disable_test
option tooffline_trainer
to turn off testing during training;examples/offline
; tests totest/offline
per review comments.