Kalah reinforcement learning by AlphaGo Zero methods.
This project is based in two main resources:
- DeepMind's Oct19th publication: Mastering the Game of Go without Human Knowledge.
- The great Reversi development of the DeepMind ideas that @mokemokechicken did in his repo: https://github.com/mokemokechicken/reversi-alpha-zero
- The Connect4 version created by @Zeta36 : https://github.com/Zeta36/connect4-alpha-zero
This is the 4 stone version that is found in play-mancala.com. After about a day of training it is able to beat me (I consider myself a good Kalah player). More stats to follow.
My Goal: beat GMKalah https://github.com/johnnyvf24/GMKalah-AI a traditional Alpha-beta program.
- Python 3.6.3
- tensorflow-gpu: 1.3.0
- Keras: 2.0.8
This AlphaGo Zero implementation consists of three worker self
, opt
and eval
.
self
is Self-Play to generate training data by self-play using BestModel.opt
is Trainer to train model, and generate next-generation models.eval
is Evaluator to evaluate whether the next-generation model is better than BestModel. If better, replace BestModel.
For evaluation, you can play chess with the BestModel.
play_gui
is Play Game vs BestModel using ASCII character encoding.
data/model/model_best_*
: BestModel.data/model/next_generation/*
: next-generation models.data/play_data/play_*.json
: generated training data.logs/main.log
: log file.
If you want to train the model from the beginning, delete the above directories.
pip install -r requirements.txt
If you want use GPU,
pip install tensorflow-gpu
Create .env
file and write this.
KERAS_BACKEND=tensorflow
For training model, execute Self-Play
, Trainer
and Evaluator
.
python src/kalah_zero/run.py self
When executed, Self-Play will start using BestModel. If the BestModel does not exist, new random model will be created and become BestModel.
--new
: create new BestModel--type mini
: use mini config for testing, (seesrc/kalah_zero/configs/mini.py
)
python src/kalah_zero/run.py opt
When executed, Training will start. A base model will be loaded from latest saved next-generation model. If not existed, BestModel is used. Trained model will be saved every 2000 steps(mini-batch) after epoch.
--type mini
: use mini config for testing, (seesrc/kalah_zero/configs/mini.py
)--total-step
: specify total step(mini-batch) numbers. The total step affects learning rate of training.
python src/kalah_zero/run.py eval
When executed, Evaluation will start. It evaluates BestModel and the latest next-generation model by playing about 200 games. If next-generation model wins, it becomes BestModel.
--type mini
: use mini config for testing, (seesrc/kalah_zero/configs/mini.py
)
python src/kalah_zero/run.py play_gui
Displays a ASCII representation of the board and allows a human to play against the agent.