Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] When should TimeFeatureWrapper be used? #79

Closed
PierreExeter opened this issue May 11, 2020 · 6 comments
Closed

[Question] When should TimeFeatureWrapper be used? #79

PierreExeter opened this issue May 11, 2020 · 6 comments
Labels
question Further information is requested

Comments

@PierreExeter
Copy link
Contributor

PierreExeter commented May 11, 2020

Hello,

In the tuned hyperparameters yml files, I noticed that some environments are wrapped with TimeFeatureWrapper. This is the case for most environments trained with TD3 (and TRPO) but not for the other algorithms. How do you decide when the environment should be wrapped in a TimeFeatureWrapper?

I understand from this paper that this wrapper is necessary for environments with a fixed number of time steps so that they respect the Markov property.

To give more context, I would like to compare the performance of TD3 and A2C for a same environment over an equal number of time steps per episode.
If I train with TimeFeatureWrapper, the episode lengths are not guarantied to be equal so comparing the mean reward per episode doesn't make sense anymore.
If I train without the wrapper, I may violate the Markov property.

Thanks,
Pierre

@araffin
Copy link
Owner

araffin commented May 11, 2020

How do you decide when the environment should be wrapped in a TimeFeatureWrapper?
I understand from this paper that this wrapper is necessary for environments with a fixed number of time steps so that they respect the Markov property.

As a rule of thumb, use it on every environment with fixed episode length. The impact is more or less big depending on the algorithm (and some hyperparameters in the zoo are not completely up to date, hence the inconsistency)

If I train with TimeFeatureWrapper, the episode lengths are not guarantied to be equal so comparing the mean reward per episode doesn't make sense anymore.

the wrapper just add a feature, it should not change the environment, and a termination condition can also be satisfied before the max episode length.

@araffin araffin added the question Further information is requested label May 11, 2020
@PierreExeter
Copy link
Contributor Author

PierreExeter commented May 11, 2020

Thanks!

Ok I understand that with this wrapper, it's no longer possible to have a fixed episode length.

However this poses problem during the evaluation, when computing the mean cumulative return over X episodes. The longer episodes will be run for more time steps and will receive a higher reward than those terminated earlier (this is the case in ReacherBulletEnv-v0 where a positive reward is given at each time step). In the enjoy.py function, it is only possible to evaluate for a fixed number of time steps. With this setting, some evaluation episodes will receive a higher reward than others, depending on when they are terminated.

How can I ensure a fair evaluation when using TimeFeatureWrapper?

(Note, in my environment, the only termination condition is defined by max_episode_steps=150)

@araffin
Copy link
Owner

araffin commented May 11, 2020

Ok I understand that with this wrapper, it's no longer possible to have a fixed episode length.

Why?

@PierreExeter
Copy link
Contributor Author

Because the environment's termination condition (max_episode_steps=150) is overridden by max_steps=1000 from the TimeFeatureWrapper class. Should I force done=True after 150 steps when evaluating? Sorry if I'm missing the point.

@araffin
Copy link
Owner

araffin commented May 11, 2020

Because the environment's termination condition (max_episode_steps=150) is overridden by max_steps=1000

It is not overriden, it is only used to compute the feature, not to compute termination:
https://github.com/araffin/rl-baselines-zoo/blob/master/utils/wrappers.py#L71

and it uses the one defined by th env if present: https://github.com/araffin/rl-baselines-zoo/blob/master/utils/wrappers.py#L48

@PierreExeter
Copy link
Contributor Author

Apologies, I forgot to reply. The issue I had with the episode length was due to an implementation problem in my custom environment, not to the TimeFeatureWrapper. Thanks for the clarification above about when to use the wrapper. I'm happy to have this issue closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants