You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The traditional implementation of REINFORCE, without importance sampling should only use data collected by the current policy to update the parameters. However, in reinforce.py, the data buffer doesn't seem to reset after every policy update. Thoughts?
The text was updated successfully, but these errors were encountered:
Hi, thanks for the comment!
I think you can find the code for resetting data buffer in line 37, reinforce.py.
When train ends, it makes the buffer empty, and collect new data with updated policy.
The traditional implementation of REINFORCE, without importance sampling should only use data collected by the current policy to update the parameters. However, in reinforce.py, the data buffer doesn't seem to reset after every policy update. Thoughts?
The text was updated successfully, but these errors were encountered: