This is a simple demonstration of how an RL agent can learn to play tic-tac-toe by playing against itself and updating its knowledge table using Temporal-Difference method.
This is inspired from the example provided in the introduction of the book Reinforcement Learning, second edition: An Introduction
Run go run main.go
from root. Additionally set number of training episodes in main.go to get the desired strength of the agent. By 10,000 training episodes, the agent plays optimally.