Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reinforce implementation looks to use old data without importance sampling #1

Closed
sritee opened this issue May 26, 2019 · 1 comment
Closed

Comments

@sritee
Copy link

sritee commented May 26, 2019

The traditional implementation of REINFORCE, without importance sampling should only use data collected by the current policy to update the parameters. However, in reinforce.py, the data buffer doesn't seem to reset after every policy update. Thoughts?

@seungeunrho
Copy link
Owner

Hi, thanks for the comment!
I think you can find the code for resetting data buffer in line 37, reinforce.py.
When train ends, it makes the buffer empty, and collect new data with updated policy.

@sritee sritee closed this as completed May 27, 2019
seungeunrho pushed a commit that referenced this issue Jul 21, 2019
Update repo with correct implementation of A3C
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants