Skip to content

Latest commit

 

History

History
71 lines (57 loc) · 4.1 KB

reinforcement_learning.md

File metadata and controls

71 lines (57 loc) · 4.1 KB

About reinforcement learning

TamaGo has a function to execute reinforcement learning like Gumbel AlphaZero style.

Prerequisites for reinforcement learning

GNUGo is used to correct the results of self-play games during reinforcement learning runs. Since reinforcement learning proceeds without GNUGo, the installation of GNUGo is optional. However, I recommend the use of GNUGo because of TamaGo's win/loss decision at the end of the game is very messy.

To install GNUGo, simply execute the following command on Ubuntu.

apt install gnugo

Hyperparameters for reinforcement learning

Hyperparameters for reinforcement learning is defined in learning_param.py.

Hyperparameter Description Example of value Note
RL_LEARNING_RATE Learning rate for reinforcement learning. 0.01 学習がある程度進んだ時に小さな値に変更すると良いです。
BATCH_SIZE Mini-batch size for training. 256 GPUメモリが小さい場合はこの値を小さめに設定してください。
MOMENTUM Momentum parameter for an optimizer. 0.9
WEIGHT_DECAY Weight of L2-regularization. 1e-4 (0.0001)
DATA_SET_SIZE Number of data to be stored in a npz file. BATCH_SIZE * 4000
RL_VALUE_WEIGHT Weight of value loss against policy loss. 1.0 This must be more than 0.0.
SELF_PLAY_VISITS The number of visits per move for self-play. 16 This must be more than 1.
NUM_SELF_PLAY_WORKERS The number of self-play workers. 4
NUM_SELF_PLAY_GAMES The number of self-play games generated. 10000

Since these hyperparameters are used to confirm that reinforcement learning progresses well, please use the set values as they are at first and gradually change the values to check the learning status when you try it out.

Definition of neural network structure.

Neural network is defined using the following four files.

File Definition
nn/network/dual_net.py Neural network definition.
nn/network/res_block.py Residual block definition.
nn/network/head/policy_head.py Policy head definition.
nn/network/head/value_head.py Value head definition.

If you try to change structure, I recommend you to change value of filters or blocks at first.

TamaGo's reinforcement learning process.

Reinforcement learning process runs in the following order.

  1. Using an existing neural network model and executing a specified number of self-play games.
  2. Adjusting the self-play games' results using GNUGo (optional).
  3. Executing neural network training using SGF files generated by self-play game process.
  4. Repeat from step-1 to step-3.

Reinforcement learning pipeline is defined in pipeline.sh.

Command line options for selfplay_main.py

Option Description Example of value Default value Note
--save-dir Directory path to save SGF files generated by self-play process. save_dir archive
--process The number of self-play workers. 2 NUM_SELF_PLAY_WORKERS
--num-data The number of self-play games generated. 5000 NUM_SELF_PLAY_GAMES
--size Go board size. 9 9
--use-gpu Flag to use a GPU. true true Value is true of false.
--visits The number of visits per move for self-play. 100 SELF_PLAY_VISITS
--model Path to a model file. model/rl-model.bin model/rl-model.bin

Command line options for train.py

Option Description Example of value Default value Note
--kifu-dir Path to the directory contains SGF files. /home/user/sgf_files None
--size Go board size. 5 9
--use-gpu Flag to use a GPU. true true Value is true or false.
--rl Flag to execute reinforement learning false false
--window-size Window size for reinforcement learning 500000 300000