♟️ Chess AI

This repository contains the code to train and test an autoregressive transformer model on chess games from scratch. I also used it to train the open-source DimChess-0.3B model.

📋 Summary

📋 Summary
🤖 DimChess-0.3B
📦 Dependencies
🦾 Training
⚗️ Testing
🙏 Credits

🤖 DimChess-0.3B

Using this repository, I trained DimChess-0.3B, a small 0.3B chess model on 14M chess games with my personal RTX 3090 GPU during ≈260 hours.

🏗️ Architecture

The model is based on the transformer architecture (only the decoder part) from the paper Attention is All You Need by Google Brain (2017), with a few improvements:

I replaced the default normalization layer by the Root Mean Square Layer Normalization (RMSNorm) from the paper Root Mean Square Layer Normalization by Edinburgh University (2019)
I moved the normalization layers before the transformer blocks (instead of after) like in the paper On Layer Normalization in the Transformer Architecture by Microsoft Research (2020)
I replaced the ReLU activation by the SwiGLU activation from the paper GLU Variants Improve Transformer by Google (2020)
I implemented Grouped-Query Attention (GQA) from the paper GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints by Google Research (2023)
I replaced the absolute positional embedding by the Rotary Position Embedding (RoPE) from the paper RoFormer: Enhanced Transformer with Rotary Position Embedding by Zhuiyi Technology (2023)
I added mulitple input and output embeddings to use different token vocabularies at the same time (board position, piece type, capture...)

Here are the main parameters of the architecture:

Parameter	Value
Embedding dimension	1,024
Number of layers	24
Heads dimension	64
Feed forward hidden dimension	2,730
Number of heads	16
Number of grouped heads	4
Context length	2048
Vocab sizes	74, 13, 2, 2, 2, 2

The resulting model has 264,436,736 trainable parameters and fits on a single RTX 3090 GPU with a batch size of 4 for training using mixed precision. For inference only, the model will probably fit on any modern GPU.

💾 Data

The dataset I made to train this model is composed of 14M chess games from high level players for a total of 1.2B moves played between 1600 and 2024. You can download it on Hugging Face 🤗.

For the tokenization, I created a custom multi layer move to token tokenizer with 6 different vocabularies:

Board position : 74 tokens for the 64 squares of the chess board + 10 control tokens
Piece type : 13 tokens for the 12 different pieces + 1 null token
Capture : 2 tokens for the capture state
En passant : 2 tokens for the en passant state
Check : 2 tokens for the check state
Checkmate : 2 tokens for the checkmate state

A move is usually composed of 3 tokens (each token containing 6 layers):

The board position of the piece to move with the piece type (null for the other layers)
The board position of the destination square with the piece type (can be different in case of promotion) and the different states depending on the move
The <m/> token (null for the other layers)

If the move is a castle, 2 tokens are added before the <m/> token for the rook move.

🦾 Training

For the training I used stochastic gradient descent with warmup and cosine decay learning rate schedules, here are the main hyperparameters:

Hyperparameter	Value
Batch size (tokens)	524,288
Optimizer	AdamW
Learning rate	6.0 × 10^-4
Warmup steps	2,000
Decay steps	28,000
β₁	0.9
β₂	0.95
ε	10^-8
Weight decay	0.1
Gradient clipping	1.0

I trained the model on my personal RTX 3090 GPU for ≈4 epochs using mixed precision and gradient accumulation to increase the speed and reduce the memory usage :

Training summary
Tokens	15,728,640,000
Steps	30,000
FLOPs	2.5 × 10¹⁹
Duration	256 hours
Final loss	0.63
Final accuracy	79.9 %
Final elo	1,741 ± 11

🧪 Tests

I tested the model against the Stockfish 16 chess engine configured with the UCI_Elo parameter (from ≈1,300 to ≈3,200), the first 3 moves of each side were chosen randomly to create different games. Here are the results:

Using theses results I estimated the elo of the model to be around 1,741 (±11) but the Stockfish UCI elo metric is a bit unclear so I don't know to what extent it makes sense to compare it to the FIDE, Lichess or Chess.com ones.

🎛️ Weights

The trained weights of the model are available on Google Drive, you just need to download the .pt file of the model and put it in the models folder.

📦 Dependencies

Run the following command to install the dependencies:

$ pip install -r requirements.txt

⚠️ You may need to use a specific command for PyTorch if you want to use CUDA

⚠️ You way need to manually install a Flash Attention release for Windows

⚠️ You way need to download the Stockfish engine to make the Stockfish library work

🦾 Training

Set the STOCKFISH_PATH constant in chess_ai/settings.py to the path of your Stockfish engine
Run the create_data.ipynb file to create the dataset
Run the training.ipynb file (you can stop the training at any time and resume it later thanks to the checkpoints)
If you don't have an overpriced 24GB GPU like me, the default settings (those used to train DimChess-0.3B) may not work for you. You can try to:
- Reduce the batch size (less stable and worse lowest point)
- Increase the accumulation steps (fix previous problems but slower)
- Reduce some architecture parameters (worse lowest point)

⚗️ Testing

Run the testing.ipynb file to use the models you downloaded or trained

🙏 Credits

Angel Uriot : Creator of the project.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
chess_ai		chess_ai
models		models
resources/misc		resources/misc
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
create_data.ipynb		create_data.ipynb
evaluation.ipynb		evaluation.ipynb
requirements.txt		requirements.txt
testing.ipynb		testing.ipynb
training.ipynb		training.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

♟️ Chess AI

📋 Summary

🤖 DimChess-0.3B

🏗️ Architecture

💾 Data

🦾 Training

🧪 Tests

🎛️ Weights

📦 Dependencies

🦾 Training

⚗️ Testing

🙏 Credits

About

Contributors 2

Languages

License

angeluriot/Chess_AI

Folders and files

Latest commit

History

Repository files navigation

♟️ Chess AI

📋 Summary

🤖 DimChess-0.3B

🏗️ Architecture

💾 Data

🦾 Training

🧪 Tests

🎛️ Weights

📦 Dependencies

🦾 Training

⚗️ Testing

🙏 Credits

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages