Skip to content

johnnyvf24/kalah-alpha-zero

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About

Kalah reinforcement learning by AlphaGo Zero methods.

This project is based in two main resources:

  1. DeepMind's Oct19th publication: Mastering the Game of Go without Human Knowledge.
  2. The great Reversi development of the DeepMind ideas that @mokemokechicken did in his repo: https://github.com/mokemokechicken/reversi-alpha-zero
  3. The Connect4 version created by @Zeta36 : https://github.com/Zeta36/connect4-alpha-zero

This is the 4 stone version that is found in play-mancala.com. After about a day of training it is able to beat me (I consider myself a good Kalah player). More stats to follow.

My Goal: beat GMKalah https://github.com/johnnyvf24/GMKalah-AI a traditional Alpha-beta program.

Environment

  • Python 3.6.3
  • tensorflow-gpu: 1.3.0
  • Keras: 2.0.8

Modules

Reinforcement Learning

This AlphaGo Zero implementation consists of three worker self, opt and eval.

  • self is Self-Play to generate training data by self-play using BestModel.
  • opt is Trainer to train model, and generate next-generation models.
  • eval is Evaluator to evaluate whether the next-generation model is better than BestModel. If better, replace BestModel.

Evaluation

For evaluation, you can play chess with the BestModel.

  • play_gui is Play Game vs BestModel using ASCII character encoding.

Data

  • data/model/model_best_*: BestModel.
  • data/model/next_generation/*: next-generation models.
  • data/play_data/play_*.json: generated training data.
  • logs/main.log: log file.

If you want to train the model from the beginning, delete the above directories.

How to use

Setup

install libraries

pip install -r requirements.txt

If you want use GPU,

pip install tensorflow-gpu

set environment variables

Create .env file and write this.

KERAS_BACKEND=tensorflow

Basic Usages

For training model, execute Self-Play, Trainer and Evaluator.

Self-Play

python src/kalah_zero/run.py self

When executed, Self-Play will start using BestModel. If the BestModel does not exist, new random model will be created and become BestModel.

options

  • --new: create new BestModel
  • --type mini: use mini config for testing, (see src/kalah_zero/configs/mini.py)

Trainer

python src/kalah_zero/run.py opt

When executed, Training will start. A base model will be loaded from latest saved next-generation model. If not existed, BestModel is used. Trained model will be saved every 2000 steps(mini-batch) after epoch.

options

  • --type mini: use mini config for testing, (see src/kalah_zero/configs/mini.py)
  • --total-step: specify total step(mini-batch) numbers. The total step affects learning rate of training.

Evaluator

python src/kalah_zero/run.py eval

When executed, Evaluation will start. It evaluates BestModel and the latest next-generation model by playing about 200 games. If next-generation model wins, it becomes BestModel.

options

  • --type mini: use mini config for testing, (see src/kalah_zero/configs/mini.py)

Play Game

python src/kalah_zero/run.py play_gui

Displays a ASCII representation of the board and allows a human to play against the agent.

About

AlphaGo Zero methods for the game of Mancala

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages