add PER support #23

xuxiyang1993 · 2020-03-20T00:10:41Z

Hi,

I implemented the Prioritized Experience Replay (PER) described in the MuZero paper, where the priority of each transition is calculated according to the difference between the predicted value and true n-step target value. In the replay buffer, the game_history is sampled according to game priority, which is the mean of all the transition priorities in the game_history. Then the position is sampled according to transition priorities. The priorities are calculated in the update_weights() function, which is then updated in the replay_buffer. update_priorities() function.

Let me know if it looks good to you!

werner-duvaud · 2020-03-20T22:53:31Z

Hi,

Thank you, I was just working on it.

I created a branch to continue the work on it.

I just have a question on one line:
Why did you choose 'wrap' mode in numpy.put() ? I may be wrong but I have the impression that the probability of the absorbing states is out of bound and overwrites the probabilities at the beginning of the list.

Also do you have a reference for the mean?

I tried to list the things that still have to be added. I'm going to work on it, feel free to contribute.

Make an option to choose whether or not to use the priority replay. (I will commit it soon)
Add the loss scaling using the importance sampling ratio. (I'm having trouble figuring out how to do this without turning the buffer into a very long list with all the steps of each game)
Maybe assign an initial value of probabilities based on the loss of root.value and the predicted value in MCTS (or 1 as you did, could be a parameter).
Add the possibility of adjusting alpha.
Find a way to avoid duplicating support_to_scalar

Here are the references on which I work, I am interested if you have other references:
PRIORITIZED EXPERIENCE REPLAY
DISTRIBUTED PRIORITIZED EXPERIENCE REPLAY

Add PER support

xuxiyang1993 added 2 commits March 19, 2020 18:56

add PER support

ebc9434

add PER support

fe486e3

werner-duvaud changed the base branch from master to prioritized_replay March 20, 2020 22:54

werner-duvaud merged commit 4d54162 into werner-duvaud:prioritized_replay Mar 20, 2020

egafni pushed a commit to egafni/muzero-general that referenced this pull request Apr 15, 2021

Merge pull request werner-duvaud#23 from xuxiyang1993/master

976aa01

Add PER support

EpicLiem pushed a commit to EpicLiem/muzero-general-chess-archive that referenced this pull request Feb 4, 2023

Merge pull request werner-duvaud#23 from xuxiyang1993/master

0828f12

Add PER support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add PER support #23

add PER support #23

xuxiyang1993 commented Mar 20, 2020

werner-duvaud commented Mar 20, 2020

add PER support #23

add PER support #23

Conversation

xuxiyang1993 commented Mar 20, 2020

werner-duvaud commented Mar 20, 2020