Investigate DQN's regression in `MountainCar-v0` #156

vwxyzjn · 2022-04-09T20:44:02Z

Problem Description

In the previous version of Open RL Benchmark, we clearly observed that our dqn.py was able to solve MountainCar-v0 (see link). However, I could no longer reproduce this result with the latest dqn.py using the exact same hyperparameters. See here for the regression report.

Looking into the root cause

After looking into this further, it turns out the "culprit" is SB3's replay buffer. Our upstream SB3's replay buffer starts to properly handle truncation vs termination (see DLR-RM/stable-baselines3#243), and by disabling the proper handling of truncation via handle_timeout_termination=False I was able to reproduce past performance... ironically (see https://wandb.ai/costa-huang/cleanRL/reports/MountainCar-v0-Regression-Investigation--VmlldzoxODEyMzgw).

Where to go from here

I don't think finding proper hyperparameters for dqn.py should block #121, but this is something we can look into in the future.

The text was updated successfully, but these errors were encountered:

vwxyzjn · 2022-05-10T14:42:56Z

closed by #173

vwxyzjn changed the title ~~Investigate DQN~~ Investigate DQN's regression in MountainCar-v0 Apr 9, 2022

vwxyzjn closed this as completed May 10, 2022

qsh-zh mentioned this issue Aug 6, 2022

DQN on MountainCar #255

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate DQN's regression in `MountainCar-v0` #156

Investigate DQN's regression in `MountainCar-v0` #156

vwxyzjn commented Apr 9, 2022 •

edited

Loading

vwxyzjn commented May 10, 2022

Investigate DQN's regression in MountainCar-v0 #156

Investigate DQN's regression in MountainCar-v0 #156

Comments

vwxyzjn commented Apr 9, 2022 • edited Loading

Problem Description

Looking into the root cause

Where to go from here

vwxyzjn commented May 10, 2022

Investigate DQN's regression in `MountainCar-v0` #156

Investigate DQN's regression in `MountainCar-v0` #156

vwxyzjn commented Apr 9, 2022 •

edited

Loading