-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filtering out artificial teminal states #863
Comments
Hello, thanks for pointing out that problem.
Actually, the right way would be to check for |
So if I get it right, this is the right way to filter out artificial terminal flags: done = False if info['TimeLimit.truncated'] else done Am I right? And do you have any plan to add this to stable-baselines or stable-baselines3? |
Looks good ;)
not for now, as the time feature is sufficient and avoid including additional complexity in the code (it gets a little more complex when using multiple environments). |
I created a branch on SB3 but it in fact a bit more tricky than expected (notably because For A2C/PPO or any n-step methods, we would need to keep track of two types of terminations signal... |
@araffin what is the status of this ? |
Answered here DLR-RM/stable-baselines3#829 |
In many gym environments, like MountainCarContinuous, there is an epsiode step limit. This leads to episode termination before actually achieving the end of trajectory(which in this case is reaching uphill).
Saving these experiences to buffer without changing artificial terminals to False, for example, in here, leads to an error in computing TD errors. I think the agent's prediction about the future rewards while it has not reached the real end of the trajectory yet, should be taken into account.
This is why some implementations like OpenAI SpinningUp change that terminal states before saving the experience, like this:
The text was updated successfully, but these errors were encountered: