[Question] When should TimeFeatureWrapper be used? #79

PierreExeter · 2020-05-11T10:38:56Z

Hello,

In the tuned hyperparameters yml files, I noticed that some environments are wrapped with TimeFeatureWrapper. This is the case for most environments trained with TD3 (and TRPO) but not for the other algorithms. How do you decide when the environment should be wrapped in a TimeFeatureWrapper?

I understand from this paper that this wrapper is necessary for environments with a fixed number of time steps so that they respect the Markov property.

To give more context, I would like to compare the performance of TD3 and A2C for a same environment over an equal number of time steps per episode.
If I train with TimeFeatureWrapper, the episode lengths are not guarantied to be equal so comparing the mean reward per episode doesn't make sense anymore.
If I train without the wrapper, I may violate the Markov property.

Thanks,
Pierre

araffin · 2020-05-11T10:43:09Z

How do you decide when the environment should be wrapped in a TimeFeatureWrapper?
I understand from this paper that this wrapper is necessary for environments with a fixed number of time steps so that they respect the Markov property.

As a rule of thumb, use it on every environment with fixed episode length. The impact is more or less big depending on the algorithm (and some hyperparameters in the zoo are not completely up to date, hence the inconsistency)

If I train with TimeFeatureWrapper, the episode lengths are not guarantied to be equal so comparing the mean reward per episode doesn't make sense anymore.

the wrapper just add a feature, it should not change the environment, and a termination condition can also be satisfied before the max episode length.

PierreExeter · 2020-05-11T11:29:33Z

Thanks!

Ok I understand that with this wrapper, it's no longer possible to have a fixed episode length.

However this poses problem during the evaluation, when computing the mean cumulative return over X episodes. The longer episodes will be run for more time steps and will receive a higher reward than those terminated earlier (this is the case in ReacherBulletEnv-v0 where a positive reward is given at each time step). In the enjoy.py function, it is only possible to evaluate for a fixed number of time steps. With this setting, some evaluation episodes will receive a higher reward than others, depending on when they are terminated.

How can I ensure a fair evaluation when using TimeFeatureWrapper?

(Note, in my environment, the only termination condition is defined by max_episode_steps=150)

araffin · 2020-05-11T11:54:41Z

Ok I understand that with this wrapper, it's no longer possible to have a fixed episode length.

Why?

PierreExeter · 2020-05-11T12:17:12Z

Because the environment's termination condition (max_episode_steps=150) is overridden by max_steps=1000 from the TimeFeatureWrapper class. Should I force done=True after 150 steps when evaluating? Sorry if I'm missing the point.

araffin · 2020-05-11T12:28:09Z

Because the environment's termination condition (max_episode_steps=150) is overridden by max_steps=1000

It is not overriden, it is only used to compute the feature, not to compute termination:
https://github.com/araffin/rl-baselines-zoo/blob/master/utils/wrappers.py#L71

and it uses the one defined by th env if present: https://github.com/araffin/rl-baselines-zoo/blob/master/utils/wrappers.py#L48

PierreExeter · 2020-05-19T21:40:12Z

Apologies, I forgot to reply. The issue I had with the episode length was due to an implementation problem in my custom environment, not to the TimeFeatureWrapper. Thanks for the clarification above about when to use the wrapper. I'm happy to have this issue closed.

araffin added the question Further information is requested label May 11, 2020

This was referenced May 11, 2020

The stable baselines implementation of TD3 can not achieve the same performance as the original TD3 [question] hill-a/stable-baselines#840

Closed

Filtering out artificial teminal states hill-a/stable-baselines#863

Closed

araffin closed this as completed May 19, 2020

araffin mentioned this issue Jul 23, 2020

SAC Agent For Ant (PyBulletEnv-v0) Has Dimension Mismatch (Training with GAIL) #93

Open

araffin mentioned this issue Sep 15, 2020

[question] Enabling agents to keep bootstraping in the last step per episode hill-a/stable-baselines#1004

Closed

araffin mentioned this issue Jan 7, 2021

[Bug] Infinite horizon tasks are handled like episodic tasks DLR-RM/stable-baselines3#284

Closed

araffin mentioned this issue Jul 21, 2021

[Question] state dimension different from original gym. DLR-RM/rl-baselines3-zoo#137

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] When should TimeFeatureWrapper be used? #79

[Question] When should TimeFeatureWrapper be used? #79

PierreExeter commented May 11, 2020 •

edited

Loading

araffin commented May 11, 2020

PierreExeter commented May 11, 2020 •

edited

Loading

araffin commented May 11, 2020

PierreExeter commented May 11, 2020

araffin commented May 11, 2020

PierreExeter commented May 19, 2020

[Question] When should TimeFeatureWrapper be used? #79

[Question] When should TimeFeatureWrapper be used? #79

Comments

PierreExeter commented May 11, 2020 • edited Loading

araffin commented May 11, 2020

PierreExeter commented May 11, 2020 • edited Loading

araffin commented May 11, 2020

PierreExeter commented May 11, 2020

araffin commented May 11, 2020

PierreExeter commented May 19, 2020

PierreExeter commented May 11, 2020 •

edited

Loading

PierreExeter commented May 11, 2020 •

edited

Loading