You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the previous version of Open RL Benchmark, we clearly observed that our dqn.py was able to solve MountainCar-v0 (see link). However, I could no longer reproduce this result with the latest dqn.py using the exact same hyperparameters. See here for the regression report.
Problem Description
In the previous version of Open RL Benchmark, we clearly observed that our
dqn.py
was able to solveMountainCar-v0
(see link). However, I could no longer reproduce this result with the latestdqn.py
using the exact same hyperparameters. See here for the regression report.Looking into the root cause
After looking into this further, it turns out the "culprit" is SB3's replay buffer. Our upstream SB3's replay buffer starts to properly handle truncation vs termination (see DLR-RM/stable-baselines3#243), and by disabling the proper handling of truncation via
handle_timeout_termination=False
I was able to reproduce past performance... ironically (see https://wandb.ai/costa-huang/cleanRL/reports/MountainCar-v0-Regression-Investigation--VmlldzoxODEyMzgw).Where to go from here
I don't think finding proper hyperparameters for
dqn.py
should block #121, but this is something we can look into in the future.The text was updated successfully, but these errors were encountered: