-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] When should TimeFeatureWrapper be used? #79
Comments
As a rule of thumb, use it on every environment with fixed episode length. The impact is more or less big depending on the algorithm (and some hyperparameters in the zoo are not completely up to date, hence the inconsistency)
the wrapper just add a feature, it should not change the environment, and a termination condition can also be satisfied before the max episode length. |
Thanks! Ok I understand that with this wrapper, it's no longer possible to have a fixed episode length. However this poses problem during the evaluation, when computing the mean cumulative return over X episodes. The longer episodes will be run for more time steps and will receive a higher reward than those terminated earlier (this is the case in How can I ensure a fair evaluation when using TimeFeatureWrapper? (Note, in my environment, the only termination condition is defined by |
Why? |
Because the environment's termination condition ( |
It is not overriden, it is only used to compute the feature, not to compute termination: and it uses the one defined by th env if present: https://github.com/araffin/rl-baselines-zoo/blob/master/utils/wrappers.py#L48 |
Apologies, I forgot to reply. The issue I had with the episode length was due to an implementation problem in my custom environment, not to the TimeFeatureWrapper. Thanks for the clarification above about when to use the wrapper. I'm happy to have this issue closed. |
Hello,
In the tuned hyperparameters yml files, I noticed that some environments are wrapped with TimeFeatureWrapper. This is the case for most environments trained with TD3 (and TRPO) but not for the other algorithms. How do you decide when the environment should be wrapped in a TimeFeatureWrapper?
I understand from this paper that this wrapper is necessary for environments with a fixed number of time steps so that they respect the Markov property.
To give more context, I would like to compare the performance of TD3 and A2C for a same environment over an equal number of time steps per episode.
If I train with TimeFeatureWrapper, the episode lengths are not guarantied to be equal so comparing the mean reward per episode doesn't make sense anymore.
If I train without the wrapper, I may violate the Markov property.
Thanks,
Pierre
The text was updated successfully, but these errors were encountered: