Skip to content

angeluriot/Chess_AI

Repository files navigation

♟️ Chess AI

Release Language Libraries Size Open Source


This repository contains the code to train and test an autoregressive transformer model on chess games from scratch. I also used it to train the open-source DimChess-0.3B model.



📋 Summary


🤖 DimChess-0.3B

Using this repository, I trained DimChess-0.3B, a small 0.3B chess model on 14M chess games with my personal RTX 3090 GPU during ≈260 hours.


🏗️ Architecture

The model is based on the transformer architecture (only the decoder part) from the paper Attention is All You Need by Google Brain (2017), with a few improvements:


Here are the main parameters of the architecture:

Parameter Value
Embedding dimension 1,024
Number of layers 24
Heads dimension 64
Feed forward hidden dimension 2,730
Number of heads 16
Number of grouped heads 4
Context length 2048
Vocab sizes 74, 13, 2, 2, 2, 2

The resulting model has 264,436,736 trainable parameters and fits on a single RTX 3090 GPU with a batch size of 4 for training using mixed precision. For inference only, the model will probably fit on any modern GPU.


💾 Data

The dataset I made to train this model is composed of 14M chess games from high level players for a total of 1.2B moves played between 1600 and 2024. You can download it on Hugging Face 🤗.

For the tokenization, I created a custom multi layer move to token tokenizer with 6 different vocabularies:

  1. Board position : 74 tokens for the 64 squares of the chess board + 10 control tokens
  2. Piece type : 13 tokens for the 12 different pieces + 1 null token
  3. Capture : 2 tokens for the capture state
  4. En passant : 2 tokens for the en passant state
  5. Check : 2 tokens for the check state
  6. Checkmate : 2 tokens for the checkmate state

A move is usually composed of 3 tokens (each token containing 6 layers):

  1. The board position of the piece to move with the piece type (null for the other layers)
  2. The board position of the destination square with the piece type (can be different in case of promotion) and the different states depending on the move
  3. The <m/> token (null for the other layers)

If the move is a castle, 2 tokens are added before the <m/> token for the rook move.


🦾 Training

For the training I used stochastic gradient descent with warmup and cosine decay learning rate schedules, here are the main hyperparameters:

Hyperparameter Value
Batch size (tokens) 524,288
Optimizer AdamW
Learning rate 6.0 × 10-4
Warmup steps 2,000
Decay steps 28,000
β1 0.9
β2 0.95
ε 10-8
Weight decay 0.1
Gradient clipping 1.0

I trained the model on my personal RTX 3090 GPU for ≈4 epochs using mixed precision and gradient accumulation to increase the speed and reduce the memory usage :

Training summary
Tokens 15,728,640,000
Steps 30,000
FLOPs 2.5 × 1019
Duration 256 hours
Final loss 0.63
Final accuracy 79.9 %
Final elo 1,741 ± 11


🧪 Tests

I tested the model against the Stockfish 16 chess engine configured with the UCI_Elo parameter (from ≈1,300 to ≈3,200), the first 3 moves of each side were chosen randomly to create different games. Here are the results:



Using theses results I estimated the elo of the model to be around 1,741 (±11) but the Stockfish UCI elo metric is a bit unclear so I don't know to what extent it makes sense to compare it to the FIDE, Lichess or Chess.com ones.


🎛️ Weights

The trained weights of the model are available on Google Drive, you just need to download the .pt file of the model and put it in the models folder.


📦 Dependencies


Run the following command to install the dependencies:

$ pip install -r requirements.txt

⚠️ You may need to use a specific command for PyTorch if you want to use CUDA

⚠️ You way need to manually install a Flash Attention release for Windows

⚠️ You way need to download the Stockfish engine to make the Stockfish library work


🦾 Training

  • Set the STOCKFISH_PATH constant in chess_ai/settings.py to the path of your Stockfish engine

  • Run the create_data.ipynb file to create the dataset

  • Run the training.ipynb file (you can stop the training at any time and resume it later thanks to the checkpoints)

  • If you don't have an overpriced 24GB GPU like me, the default settings (those used to train DimChess-0.3B) may not work for you. You can try to:

    • Reduce the batch size (less stable and worse lowest point)
    • Increase the accumulation steps (fix previous problems but slower)
    • Reduce some architecture parameters (worse lowest point)

⚗️ Testing

  • Run the testing.ipynb file to use the models you downloaded or trained

🙏 Credits