-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust locations of setting the policy in train/eval mode #1123
Conversation
…in mode in appropriate places
I'll add the context manager for policies (maybe torch already has one), as discussed, and then this is good to go. Thanks for the extension @maxhuettenrauch ! |
There's better ways to deal with deprecations that we shall use in the future
A test is not a script and should not be used as such Also marked pistonball test as skipped since it doesn't actually test anything
Renamed is_eval kwarg
@ChenDRAG Once finalized (there's still a bit of review and discussion to be done), I'd merge this PR without squashing if you don't mind. It contains somehow unrelated changes and the history would be easier to bisect on commit level if we don't squash |
Notebooks and documentation have not been updated and still use the old way of setting train/eval mode. |
…mplementation, as this is less prone to errors
New method training_step, which * collects training data (method _collect_training_data) * performs "test in train" (method _test_in_train) * performs policy update The old method named train_step performed only the first two points and was now split into two separate methods
5ba10d3
to
3184f73
Compare
* Remove flag `eval_mode` from Collector.collect * Replace flag `is_eval` in BasePolicy with `is_within_training_step` (negating usages) and set it appropriately in BaseTrainer
3184f73
to
c35be8d
Compare
# Conflicts: # CHANGELOG.md
* Add class ExperimentCollection to improve usability * Remove parameters from ExperimentBuilder.build * Renamed ExperimentBuilder.build_default_seeded_experiments to build_seeded_collection, changing the return type to ExperimentCollection * Replace temp_config_mutation (which was not appropriate for the public API) with method copy (which performs a safe deep copy)
e50c67c
to
d8e5631
Compare
# Conflicts: # CHANGELOG.md
For some reason now env.spec.reward_treshold is None - some change in upstream code Also added better pytest skip message
Adjusted notebooks, log messages and docs accordingly. Removed now obsolete in_eval_mode and the private context manager in Trainer
2685dca
to
e94a5c0
Compare
This is done, will merge without squashing when tests have finished |
Addresses #1122:
is_within_training_step
which is enabled by the training algorithm when within a training step, where a training step encompasses training data collection and policy updates. This flag is now used by algorithms to decide whether theirdeterministic_eval
setting should indeed apply instead of the torch training flag (which was abused!).no_grad=False
was removed.Further changes:
BaseCollector
in_eval_mode
andin_train_mode
for torch modules.reset
ofCollectors
now returnsobs
andinfo
.no-grad
no longer accepted as kwarg ofcollect
0.5.1
(will likely not affect anyone) and the unusedwarnings
module.