Adjust locations of setting the policy in train/eval mode #1123

maxhuettenrauch · 2024-04-24T15:13:05Z

Addresses #1122:

We Introduced a new flag is_within_training_step which is enabled by the training algorithm when within a training step, where a training step encompasses training data collection and policy updates. This flag is now used by algorithms to decide whether their deterministic_eval setting should indeed apply instead of the torch training flag (which was abused!).
The policy's training/eval mode (which should control torch-level learning only) no longer needs to be set in user code in order to control collector behaviour (this didn't make sense!). The respective calls have been removed.
The policy should, in fact, always be in evaluation mode when applying data collection, as there is no reason to ever have gradient accumulation enabled for any type of rollout. We thus specifically set the policy to evaluation mode in Collector.collect. Further, it never makes sense to compute gradients during collection, so the possibility to pass no_grad=False was removed.

Further changes:

Base class for collectors: BaseCollector
New util context managers in_eval_mode and in_train_mode for torch modules.
reset of Collectors now returns obs and info.
no-grad no longer accepted as kwarg of collect
Removed deprecations of 0.5.1 (will likely not affect anyone) and the unused warnings module.

…in mode in appropriate places

MischaPanch · 2024-04-25T16:10:09Z

I'll add the context manager for policies (maybe torch already has one), as discussed, and then this is good to go. Thanks for the extension @maxhuettenrauch !

There's better ways to deal with deprecations that we shall use in the future

A test is not a script and should not be used as such Also marked pistonball test as skipped since it doesn't actually test anything

Renamed is_eval kwarg

…sing var names

MischaPanch · 2024-04-26T16:27:40Z

@ChenDRAG Once finalized (there's still a bit of review and discussion to be done), I'd merge this PR without squashing if you don't mind. It contains somehow unrelated changes and the history would be easier to bisect on commit level if we don't squash

tianshou/policy/modelfree/dqn.py

tianshou/policy/base.py

tianshou/data/collector.py

opcode81 · 2024-05-02T08:08:41Z

Notebooks and documentation have not been updated and still use the old way of setting train/eval mode.

tianshou/data/collector.py

…mplementation, as this is less prone to errors

New method training_step, which * collects training data (method _collect_training_data) * performs "test in train" (method _test_in_train) * performs policy update The old method named train_step performed only the first two points and was now split into two separate methods

tianshou/trainer/utils.py

tianshou/policy/base.py

tianshou/trainer/base.py

* Remove flag `eval_mode` from Collector.collect * Replace flag `is_eval` in BasePolicy with `is_within_training_step` (negating usages) and set it appropriately in BaseTrainer

# Conflicts: # CHANGELOG.md

… name)

* Add class ExperimentCollection to improve usability * Remove parameters from ExperimentBuilder.build * Renamed ExperimentBuilder.build_default_seeded_experiments to build_seeded_collection, changing the return type to ExperimentCollection * Replace temp_config_mutation (which was not appropriate for the public API) with method copy (which performs a safe deep copy)

# Conflicts: # CHANGELOG.md

For some reason now env.spec.reward_treshold is None - some change in upstream code Also added better pytest skip message

Adjusted notebooks, log messages and docs accordingly. Removed now obsolete in_eval_mode and the private context manager in Trainer

MischaPanch · 2024-05-06T17:34:06Z

This is done, will merge without squashing when tests have finished

Maximilian Huettenrauch added 3 commits April 24, 2024 17:06

add is_eval attribute to policy and set this attribute as well as tra…

e499bed

…in mode in appropriate places

update examples

8cb17de

update tests

49c750f

MischaPanch self-requested a review April 25, 2024 16:10

Michael Panchenko added 5 commits April 26, 2024 14:42

Deleted long deprecated functionality, removed unused warning module

829fd9c

There's better ways to deal with deprecations that we shall use in the future

Added in_eval/in_train mode contextmanager

7d59302

Tests: removed all instances of if __name__ == ... in tests

12d4262

A test is not a script and should not be used as such Also marked pistonball test as skipped since it doesn't actually test anything

Collector: extracted interface BaseCollector, minor simplifications

4b619c5

Renamed is_eval kwarg

Tests: fixed typing issues by declaring union types and no longer reu…

69f07a8

…sing var names

MischaPanch marked this pull request as ready for review April 26, 2024 15:40

Michael Panchenko added 6 commits April 26, 2024 17:44

Merge branch 'refs/heads/thuml-master' into policy-train-eval

07a97c7

Use the new BaseCollector interface for annotations

2eaf1f3

Changelog

c28508b

Formatting

6aa33b1

Changelog [skip-ci]

e2e8a69

Dosctring add return [skip-ci]

4592271

MischaPanch requested a review from opcode81 April 26, 2024 16:25

MischaPanch previously approved these changes Apr 26, 2024

View reviewed changes

Changelog [skip-ci]

a2b9d7c