Reinforce implementation looks to use old data without importance sampling #1

sritee · 2019-05-26T20:52:51Z

The traditional implementation of REINFORCE, without importance sampling should only use data collected by the current policy to update the parameters. However, in reinforce.py, the data buffer doesn't seem to reset after every policy update. Thoughts?

seungeunrho · 2019-05-27T00:13:31Z

Hi, thanks for the comment!
I think you can find the code for resetting data buffer in line 37, reinforce.py.
When train ends, it makes the buffer empty, and collect new data with updated policy.

Update repo with correct implementation of A3C

sritee closed this as completed May 27, 2019

seungeunrho pushed a commit that referenced this issue Jul 21, 2019

Merge pull request #1 from seungeunrho/master

3c02cf0

Update repo with correct implementation of A3C

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reinforce implementation looks to use old data without importance sampling #1

Reinforce implementation looks to use old data without importance sampling #1

sritee commented May 26, 2019 •

edited

Loading

seungeunrho commented May 27, 2019

Reinforce implementation looks to use old data without importance sampling #1

Reinforce implementation looks to use old data without importance sampling #1

Comments

sritee commented May 26, 2019 • edited Loading

seungeunrho commented May 27, 2019

sritee commented May 26, 2019 •

edited

Loading