This is the code for experiments in the paper Learning to Incentivize Other Learning Agents. Baselines are included.
- Python 3.6
- Tensorflow >= 1.12
- OpenAI Gym == 0.10.9
- Clone and
pip install
Sequential Social Dilemma, which is a fork from the original open-source implementation. - Clone and
pip install
LOLA if you wish to run this baseline. - Clone this repository and run
$ pip install -e .
from the root.
alg/
- Implementation of LIO and PG/AC baselinesenv/
- Implementation of the Escape Room game and wrappers around the SSD environment.results/
- Results of training will be stored in subfolders here. Each independent training run will create a subfolder that contains the final Tensorflow model, and reward log files. For example, 5 parallel independent training runs would createresults/cleanup/10x10_lio_0
,...,results/cleanup/10x10_lio_4
(depending on configurable strings in config files).utils/
- Utility methods
- Set config values in
alg/config_room_lio.py
cd
into thealg
folder- Execute training script
$ python train_multiprocess.py lio er
. Default settings conduct 5 parallel runs with different seeds. - For a single run, execute
$ python train_lio.py er
.
- Set config values in
alg/config_ssd_lio.py
cd
into thealg
folder- Execute training script
$ python train_multiprocess.py lio ssd
. - For a single run, execute
$ python train_ssd.py
.
@article{yang2020learning, title={Learning to incentivize other learning agents}, author={Yang, Jiachen and Li, Ang and Farajtabar, Mehrdad and Sunehag, Peter and Hughes, Edward and Zha, Hongyuan}, journal={Advances in Neural Information Processing Systems}, volume={33}, pages={15208--15219}, year={2020} }
See LICENSE.
SPDX-License-Identifier: MIT