About

Kalah reinforcement learning by AlphaGo Zero methods.

This project is based in two main resources:

DeepMind's Oct19th publication: Mastering the Game of Go without Human Knowledge.
The great Reversi development of the DeepMind ideas that @mokemokechicken did in his repo: https://github.com/mokemokechicken/reversi-alpha-zero
The Connect4 version created by @Zeta36 : https://github.com/Zeta36/connect4-alpha-zero

This is the 4 stone version that is found in play-mancala.com. After about a day of training it is able to beat me (I consider myself a good Kalah player). More stats to follow.

My Goal: beat GMKalah https://github.com/johnnyvf24/GMKalah-AI a traditional Alpha-beta program.

Environment

Python 3.6.3
tensorflow-gpu: 1.3.0
Keras: 2.0.8

Modules

Reinforcement Learning

This AlphaGo Zero implementation consists of three worker self, opt and eval.

self is Self-Play to generate training data by self-play using BestModel.
opt is Trainer to train model, and generate next-generation models.
eval is Evaluator to evaluate whether the next-generation model is better than BestModel. If better, replace BestModel.

Evaluation

For evaluation, you can play chess with the BestModel.

play_gui is Play Game vs BestModel using ASCII character encoding.

Data

data/model/model_best_*: BestModel.
data/model/next_generation/*: next-generation models.
data/play_data/play_*.json: generated training data.
logs/main.log: log file.

If you want to train the model from the beginning, delete the above directories.

How to use

Setup

install libraries

pip install -r requirements.txt

If you want use GPU,

pip install tensorflow-gpu

set environment variables

Create .env file and write this.

KERAS_BACKEND=tensorflow

Basic Usages

For training model, execute Self-Play, Trainer and Evaluator.

Self-Play

python src/kalah_zero/run.py self

When executed, Self-Play will start using BestModel. If the BestModel does not exist, new random model will be created and become BestModel.

options

--new: create new BestModel
--type mini: use mini config for testing, (see src/kalah_zero/configs/mini.py)

Trainer

python src/kalah_zero/run.py opt

When executed, Training will start. A base model will be loaded from latest saved next-generation model. If not existed, BestModel is used. Trained model will be saved every 2000 steps(mini-batch) after epoch.

options

--type mini: use mini config for testing, (see src/kalah_zero/configs/mini.py)
--total-step: specify total step(mini-batch) numbers. The total step affects learning rate of training.

Evaluator

python src/kalah_zero/run.py eval

When executed, Evaluation will start. It evaluates BestModel and the latest next-generation model by playing about 200 games. If next-generation model wins, it becomes BestModel.

options

--type mini: use mini config for testing, (see src/kalah_zero/configs/mini.py)

Play Game

python src/kalah_zero/run.py play_gui

Displays a ASCII representation of the board and allows a human to play against the agent.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Environment

Modules

Reinforcement Learning

Evaluation

Data

How to use

Setup

install libraries

set environment variables

Basic Usages

Self-Play

options

Trainer

options

Evaluator

options

Play Game

About

Releases

Packages

Contributors 2

Languages

License

johnnyvf24/kalah-alpha-zero

Folders and files

Latest commit

History

Repository files navigation

About

Environment

Modules

Reinforcement Learning

Evaluation

Data

How to use

Setup

install libraries

set environment variables

Basic Usages

Self-Play

options

Trainer

options

Evaluator

options

Play Game

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages