About reinforcement learning

TamaGo has a function to execute reinforcement learning like Gumbel AlphaZero style.

Prerequisites for reinforcement learning

GNUGo is used to correct the results of self-play games during reinforcement learning runs. Since reinforcement learning proceeds without GNUGo, the installation of GNUGo is optional. However, I recommend the use of GNUGo because of TamaGo's win/loss decision at the end of the game is very messy.

To install GNUGo, simply execute the following command on Ubuntu.

apt install gnugo

Hyperparameters for reinforcement learning

Hyperparameters for reinforcement learning is defined in learning_param.py.

Hyperparameter	Description	Example of value	Note
RL_LEARNING_RATE	Learning rate for reinforcement learning.	0.01	学習がある程度進んだ時に小さな値に変更すると良いです。
BATCH_SIZE	Mini-batch size for training.	256	GPUメモリが小さい場合はこの値を小さめに設定してください。
MOMENTUM	Momentum parameter for an optimizer.	0.9
WEIGHT_DECAY	Weight of L2-regularization.	1e-4 (0.0001)
DATA_SET_SIZE	Number of data to be stored in a npz file.	BATCH_SIZE * 4000
RL_VALUE_WEIGHT	Weight of value loss against policy loss.	1.0	This must be more than 0.0.
SELF_PLAY_VISITS	The number of visits per move for self-play.	16	This must be more than 1.
NUM_SELF_PLAY_WORKERS	The number of self-play workers.	4
NUM_SELF_PLAY_GAMES	The number of self-play games generated.	10000

Since these hyperparameters are used to confirm that reinforcement learning progresses well, please use the set values as they are at first and gradually change the values to check the learning status when you try it out.

Definition of neural network structure.

Neural network is defined using the following four files.

File	Definition
nn/network/dual_net.py	Neural network definition.
nn/network/res_block.py	Residual block definition.
nn/network/head/policy_head.py	Policy head definition.
nn/network/head/value_head.py	Value head definition.

If you try to change structure, I recommend you to change value of filters or blocks at first.

TamaGo's reinforcement learning process.

Reinforcement learning process runs in the following order.

Using an existing neural network model and executing a specified number of self-play games.
Adjusting the self-play games' results using GNUGo (optional).
Executing neural network training using SGF files generated by self-play game process.
Repeat from step-1 to step-3.

Reinforcement learning pipeline is defined in pipeline.sh.

Command line options for selfplay_main.py

Option	Description	Example of value	Default value	Note
`--save-dir`	Directory path to save SGF files generated by self-play process.	save_dir	archive
`--process`	The number of self-play workers.	2	NUM_SELF_PLAY_WORKERS
`--num-data`	The number of self-play games generated.	5000	NUM_SELF_PLAY_GAMES
`--size`	Go board size.	9	9
`--use-gpu`	Flag to use a GPU.	true	true	Value is true of false.
`--visits`	The number of visits per move for self-play.	100	SELF_PLAY_VISITS
`--model`	Path to a model file.	model/rl-model.bin	model/rl-model.bin

Command line options for train.py

Option	Description	Example of value	Default value	Note
`--kifu-dir`	Path to the directory contains SGF files.	/home/user/sgf_files	None
`--size`	Go board size.	5	9
`--use-gpu`	Flag to use a GPU.	true	true	Value is true or false.
`--rl`	Flag to execute reinforement learning	false	false
`--window-size`	Window size for reinforcement learning	500000	300000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reinforcement_learning.md

reinforcement_learning.md

About reinforcement learning

Prerequisites for reinforcement learning

Hyperparameters for reinforcement learning

Definition of neural network structure.

TamaGo's reinforcement learning process.

Command line options for selfplay_main.py

Command line options for train.py

Files

reinforcement_learning.md

Latest commit

History

reinforcement_learning.md

File metadata and controls

About reinforcement learning

Prerequisites for reinforcement learning

Hyperparameters for reinforcement learning

Definition of neural network structure.

TamaGo's reinforcement learning process.

Command line options for selfplay_main.py

Command line options for train.py