Part of the code of the paper "Balanced prioritized experience replay in off-policy reinforcement learning"
The BPER.py file encapsulates the essence of this paper, which is how to update the priority of experience. I have added annotations within the code. For the rest of the code, you can refer to the implementation of PER[1], with the only difference being in the update mechanism of experience priority.
Please note that the BPER algorithm may not be suitable for all environments. Feel free to give it a try, and if it results in an improvement, that would be fantastic!
[1]Schaul T. Prioritized Experience Replay. ICLR 2016.