-
Notifications
You must be signed in to change notification settings - Fork 43
FAQ
No, self played games have been used for training for releases <= 0.6.0. The reason for this is by increasing the speed of the MCTS and appling improvements for the MCTS, the number of needed self played games are expected to be drasticly reduced.
The engine has been ported to C++ and over the course of reinforcement learning ~ 2.37 million self-play games have been generated using three V100 GPUs.
The neural network loss function is defined by two objectives. First it tries to predict the same move the human played in the position also called policy. The number of all possible moves is described as a one hot encoded vector in which each entry relates to a certain move in uci-notation. (see LABELS
in constants.py).
The policy loss is defined as a Categorical Cross Entropy Loss.
Second the network tries to predict the right outcome of the game which is represented by a single scalar value between -1 and +1.
- -1: loss from the current players perspective
- 0: draw
- +1: loss from the current players perspective
This loss term is expressed by a Mean Square Error (MSE) often also called L2-loss.
Of course there are many contradicting samples in the training data especially in the opening. Often people played different moves from the same position and sometimes a game was won and sometimes lost. The network should ideally converge to the best fit corresponding to the training data and roughly correspond to the lichess.org opening explorer. However simply memorising all the training data and replicating the opening explorer should be avoided. This phenomena is called overfitting and can be described by creating a telephone book instead of actually learning the task and is indicated by a bad generalization ability. In crazyhouse the number of possible moves is higher than chess and it's common to reach an new position between move 7 and 20.
Yes, you can take a look at Strength evaluation v0.3.1
For v0.3.1 it was 300 NPS using an AMD® Ryzen 7 1700 eight-core processor × 16 and GeForce GTX 1080 Ti.
For v0.7.0 it was 8k-10k NPS using an AMD® Ryzen 7 1700 eight-core processor × 16 and GeForce GTX 1080 Ti with TensorRT support.
How did CrazyAra become stronger between version v0.1.0 and v0.3.1 if the same training data was used?
Between v0.1.0 and v0.3.1 new neural network architecture called RISE was trained which had better performance on the training and test dataset. Also the speed of MCTS search was increased and the MCTS itself was adapted.
The strength of the raw network by always taking the move with the highest probability from the policy distribution is expected to be 2280 elo in bullet which corresponds to the rating of the average player in the training data.
The main technique is based on the DeepMind paper.
First the root node which describes the current position is expanded.
Each child node represents a possible legal move.
The neural network prior policy distribution defines how much each child node will be explored.
From a general perspective the exploration is defined by the prior policy distribution of the moves as well as the value for the future board states, also called q-values. The trade-off between the two is defined by the CPUCT
variable. MCTS tries to calculate the lines which are best for both players. The lines are not expanded until a local thread reaches either a leaf node (win, loss, draw) or the end of the current explored line. Then this position will be evaluated by the neural network and a new node will be added to the search tree. The value classification of this board position is back-propagated through the tree and the statistics of all visited nodes will be updated. A so called random rollout for reaching an actual leaf node isn't done in the newest version of MCTS proposed by DeepMind. At the end the posterior policy distribution is created based on the number of visits of each child node divided by the total number of visits of all child nodes. In simple words, after exploration the child node gets chosen which had the most visits during search.
You can find a description of the search here: README.m
For a more detailed description please refer to the official DeepMind papers:
Mastering the game of Go with deep neural networks and tree search
For CrazyAra there has been some changes in the MCTS algorithm which include the usage of the Q-values for final move selection, time management, transposition table as well as search tree pruning. Future version will likely introduce further improvements to the MCTS algorithm.
The prefix "Crazy" comes from the chess game variant "crazyhouse". "Ara" is derived from the genus of South-American parrot (ara). Parrots are intelligent animals which can learn from and mimic humans. In the case of CrazyAra it learned from human played crazyhouse games. Originally we wanted to name the BOT "DeepLee" because of "Deep Learning" and multiple world champion "JannLee" but this name was already taken.
- Home
- Installation
- Engine settings
- Command line usage
- Build instructions
- Programmer's guide
- Setting up CrazyAra as a Lichess BOT
- Neural network
- Strength evaluation
- FAQ
- Stockfish 10 - Crazyhouse Self Play
- Paper Instructions