Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate DQN's regression in MountainCar-v0 #156

Closed
vwxyzjn opened this issue Apr 9, 2022 · 1 comment
Closed

Investigate DQN's regression in MountainCar-v0 #156

vwxyzjn opened this issue Apr 9, 2022 · 1 comment

Comments

@vwxyzjn
Copy link
Owner

vwxyzjn commented Apr 9, 2022

Problem Description

In the previous version of Open RL Benchmark, we clearly observed that our dqn.py was able to solve MountainCar-v0 (see link). However, I could no longer reproduce this result with the latest dqn.py using the exact same hyperparameters. See here for the regression report.

image

Looking into the root cause

After looking into this further, it turns out the "culprit" is SB3's replay buffer. Our upstream SB3's replay buffer starts to properly handle truncation vs termination (see DLR-RM/stable-baselines3#243), and by disabling the proper handling of truncation via handle_timeout_termination=False I was able to reproduce past performance... ironically (see https://wandb.ai/costa-huang/cleanRL/reports/MountainCar-v0-Regression-Investigation--VmlldzoxODEyMzgw).

image

Where to go from here

I don't think finding proper hyperparameters for dqn.py should block #121, but this is something we can look into in the future.

@vwxyzjn vwxyzjn changed the title Investigate DQN Investigate DQN's regression in MountainCar-v0 Apr 9, 2022
@vwxyzjn
Copy link
Owner Author

vwxyzjn commented May 10, 2022

closed by #173

@vwxyzjn vwxyzjn closed this as completed May 10, 2022
@qsh-zh qsh-zh mentioned this issue Aug 6, 2022
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant