This repository contains code of trust region competitve policy optimisation (TRCoPO) algorithm. The paper for competitive policy gradient can be found here, The code for Competitive Policy Gradient (CoPG) algorithm can be found here.
Experiment videos are available here
- Code is tested on python 3.5.2.
- Only Markov Soccer experiment requires OpenSpiel library, Other experiments can be run directly.
- Require torch.utils.tensorboard
.
├── notebooks
│ ├── RockPaperScissors.ipynb
│ ├── MatchingPennies.ipynb
├── game # Each game have a saparate folder with this structure
│ ├── game.py
│ ├── copg_game.py
│ ├── gda_game.py
│ ├── network.py
├── copg_optim
│ ├── copg.py
│ ├── critic_functions.py
│ ├── utils.py
├── car_racing_simulator
└── ...
- [Jupyter notebooks] are the best point to start. It contains demonstrations and results.
- Folder [copg_optim] contains optimization code
Open jupyter notebook and run it to see results.
or
git clone "adress"
cd trcopo
cd RockPaperScissors
python3 trcopo_rps.py
cd ..
cd tensorboard
tensordboard --logdir .
You can check results in the tensorboard.