Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add PER support #23

Merged
merged 2 commits into from
Mar 20, 2020
Merged

Conversation

xuxiyang1993
Copy link
Contributor

Hi,

I implemented the Prioritized Experience Replay (PER) described in the MuZero paper, where the priority of each transition is calculated according to the difference between the predicted value and true n-step target value. In the replay buffer, the game_history is sampled according to game priority, which is the mean of all the transition priorities in the game_history. Then the position is sampled according to transition priorities. The priorities are calculated in the update_weights() function, which is then updated in the replay_buffer. update_priorities() function.

Let me know if it looks good to you!

@werner-duvaud
Copy link
Owner

Hi,

Thank you, I was just working on it.

I created a branch to continue the work on it.

I just have a question on one line:
Why did you choose 'wrap' mode in numpy.put() ? I may be wrong but I have the impression that the probability of the absorbing states is out of bound and overwrites the probabilities at the beginning of the list.

Also do you have a reference for the mean?

I tried to list the things that still have to be added. I'm going to work on it, feel free to contribute.

  • Make an option to choose whether or not to use the priority replay. (I will commit it soon)
  • Add the loss scaling using the importance sampling ratio. (I'm having trouble figuring out how to do this without turning the buffer into a very long list with all the steps of each game)
  • Maybe assign an initial value of probabilities based on the loss of root.value and the predicted value in MCTS (or 1 as you did, could be a parameter).
  • Add the possibility of adjusting alpha.
  • Find a way to avoid duplicating support_to_scalar

Here are the references on which I work, I am interested if you have other references:
PRIORITIZED EXPERIENCE REPLAY
DISTRIBUTED PRIORITIZED EXPERIENCE REPLAY

@werner-duvaud werner-duvaud changed the base branch from master to prioritized_replay March 20, 2020 22:54
@werner-duvaud werner-duvaud merged commit 4d54162 into werner-duvaud:prioritized_replay Mar 20, 2020
egafni pushed a commit to egafni/muzero-general that referenced this pull request Apr 15, 2021
EpicLiem pushed a commit to EpicLiem/muzero-general-chess-archive that referenced this pull request Feb 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants