Name		Name	Last commit message	Last commit date
parent directory ..
agents		agents
core		core
data		data
heuristics		heuristics
neural_networks		neural_networks
tests		tests
util		util
ConnectX.ipynb		ConnectX.ipynb
Dockerfile		Dockerfile
README.md		README.md
config.py		config.py
connectx.iml		connectx.iml
requirements.in		requirements.in
requirements.txt		requirements.txt
runner.py		runner.py
training_montycarlo.py		training_montycarlo.py

README.md

Connect X / Connect 4

This project explores various approaches to solving the classic Connect 4 game on Kaggle

Random Agent

Leaderboard Score: 360-415

agents/RandomAgent/RandomAgent.py

This is the simplest agent possible. Validate the list of legal moves, and then pick one at random.

This agent is very fast, but mainly provides a baseline score for other agents.

Rules Agent

Leaderboard Score: 424

This was a first attempt to create a simple agent with manually defined rules.

Strategies included:

Always play middle column if no other columns have been played
Look for an immediately winning move, then play it
If the opponent has a winning move next turn, then block it.
Look for possible double attacks either 2 or 3 moves away
Else extend the longest existing line

The work here included exploration of how to manipulate the Kaggle data structures, implementing base classes required for playing the game, and this also provided inspiration for future heuristics. The implementation of these rules was not thoroughly tested and may contain bugs.

This agent is also very quick, but doesn't look very far ahead. Scores slightly better than the range for RandomAgents, but this approach is not highly competitive on the leaderboard.

Minimax, Negamax and AlphaBeta

AlphaBeta Agent

Leaderboard Score: 1040 (top 8%) | AlphaBetaAgent + LibertiesHeuristic

This is an object oriented implementation of the MiniMax with AlphaBeta pruning algorithm and iterative deepening using a custom heuristic.

This implementation converts the board into a numpy array. Whilst convenient for programming, this might be slower (due to object copying) than performing operations directly on list primitives. Peter Cnudde (1st in this Competition) claims that bitboards are the fastest internal representation to work and capable of reaching depth 12-16.

Timeouts

Attempting to use signals (util/call_with_timeout_ms.py) to throw an exception on timeout didn't seem to work when run inside Kaggle submit.

The fastest way to exit a recursive loop is by throwing an exception.

Loop exit times are usually in the 0.1-3ms range, but on rare occasions the code will take longer than 1000ms+. There can be upto 49 turns in a game, so a Timeout on any of these will result in total game loss with an Err score.

Empirical observation of the Kaggle Leaderboard results for the effect of safety_time values:

0.75s = kaggle error rate of 26/154 = 15%
1.15s = kaggle error rate of 16/360 = 4.5%

The root cause of these random timeouts was discovered as the garbage collector. This can be solved by running gc.disable() as the first action of the agent, then gc.enable() to schedule garbage collection during our opponents turn (or the inter-turn period).

With the gc disabled, safety_time=10ms is reliable on localhost but fails on Kaggle Submit, so currently playing safe with safety_time=2000ms.

AlphaBeta Bitboard

Leaderboard Score: 1080 (top 5%) | AlphaBetaAgentBitboard + BitboardGameoversHeuristic

agents/AlphaBetaAgent/AlphaBetaBitboard.py

Focusing attention back on raw performance resulted in a bitboard implementation. A 84 bit number was used, divided into 2 x 42 bit subsections. The first half the bitboard stored if a square was empty or contained a piece (0==empty, 1==filled). The second half the bitboard stored which player's token was in each square (0==p1, 1==p2).

The gameover/utility function was implemented by creating bitmasks for all 69 possible win lines, for fast iteration to see if all squares where filled, and then if so where they all filled by the same player.

A simple but fast heuristic was designed that emulated the gameover/utility methodology, but also matched on empty squares, signifying the number of potential connect4 lines that could be created.

The pure python implementation (without numba) was able to get to depth=7 on the first move, climbing to depth=12 during the midgame, compared with depth=5-7 for the object oriented AlphaBetaAgent. This translates to a depth advantage of 2-5.

AlphaBetaBitboard is able to beat the depth=4 Kaggle Negamax agent quite easily.

AlphaBetaBitboard with the original first-pass bitboard heuristic scored a 55% winrate against object-oriented AlphaBetaAgent with its slower (but stronger) LibertiesHeuristic. A small improvement using math.log2() % 1 == 0 to heavily discount single square lines improved the winrate to 70%. Adding in a bonus double_attack_score for overlapping connect4 lines reduced the score at first to 55% with a large percentage of draws. This was tweaked to only include connect4s that overlapped via a single square (mostly those pointing in different directions) which boosted the winrate to 80%. Hyperparameter optimization resulting in double_attack_score=0.5 finally achieved a 100% winrate.

Attempting to optimise AlphaBetaBitboard with numba did not result in any significant performance increase.

Negamax Bitboard

Leaderboard Score: 1010 (top 10%) | Negamax + BitboardGameoversHeuristic

agents/Negamax/Negamax.py

Negamax is similar conceptually to Minimax, but has a single recursive function. This implementation was written in a functional rather than OO style for numba @njit optimization but without caching.

I still have not managed to get numba working inside kaggle submit, so this actually runs slower than the cached version of AlphaBeta.

Unit tests for board positions highlighted that the bitboard_gameovers_heuristic() was not an admissible heuristic, and thus failing to see some forced moves due to AlphaBeta pruning, so this was disabled.

Minimax Bitboard

Leaderboard Score: 1030 (top 8%) | Negamax + BitboardGameoversHeuristic

agents/AlphaBetaAgent/MinimaxBitboard.py

This was a version of AlphaBeta Bitboard with alpha beta pruning disabled, inspired by the unit test failures for Negamax.

However this approach scores worse on the leaderboard compared to the version with alpha-beta pruning enabled. This either indicates that greater depth is still superior to accuracy, or else there is a bug in the Negamax alpha-pruning code.

Monty Carlo Tree Search

Pure Monty Carlo Search

Leaderboard Scores:

1075 (top 6%) | MontyCarloPure + Cached Data
1065 (top 7%) | MontyCarloPure

Source:

agents/MontyCarlo/MontyCarloPure.py

Whereas Minimax / Negamax performs a breadth first search of the game state tree, Monty Carlo selectively expands the tree deeper in areas where success is more probable. Tree/graph expansion follows a similar shape to A* search.

The algorithm starts with a root node with unexpanded children.

Expansion happens in two parts. First is node selection, where is the tree is traversed from the root node, choosing the child node with the highest UCB score, which includes an additional term to encourage exploration. When an unexpanded leaf node is reached, a random simulation of the game is run from that position with the score and total counts backpropergated along the tree path.

Expansions are repeatedly run until the timeout expires, and the child of root node with the highest score is chosen as the returned as the agent action.

Monty Carlo Heuristic Search

Leaderboard Scores:

1110 (top 4%) | MontyCarloHeuristic + BitboardGameoversHeuristic + Cached Data
1110 (top 4%) | MontyCarloHeuristic + BitboardGameoversHeuristic + Cached Data
1070 (top 6%) | MontyCarloHeuristic + BitboardGameoversHeuristic

Source:

agents/MontyCarlo/MontyCarloPure.py

This approach replaces random simulation (producing an integer score of 0|1) with a sigmoided version of bitboard_gameovers_heuristic() (producing a floating point score between 0 and 1).

Scaling the sigmoid by factor of 6 (division), produced the highest winrate compared to other numbers. This means a heuristic value of +6 would return a sigmoid score of +0.73, which is pretty winning. Smaller differences in heuristic score would result in a value much closer to 0.5 draw. If a terminal state in the game tree is reached, an integer score of 0|1 is returned.

Compared to random simulation, a heuristic provides more indepth knowledge about a position, and it is significantly faster to compute than running a full game simulation. This means many more expansions can be run within the same time limit.

Ant Colony Heuristic Search

Leaderboard Score: 920 (top 18%) | AntColonyTreeSearch + BitboardGameoversHeuristic

This is a experimental variation of the Monty Carlo tree search, but instead of using max(UCB_score), the score is treated as a pheromone trail and weighted random selection is used.

Maybe this approach needs further experimentation to get right, but currently it produces results significantly worse than Pure Monty Carlo and Minimax/Negamax.

Ideas:

remove the distinction between path selection and simulation, and follow the pheromones trail until terminal state
store pheromones in a dictionary using mirror hash
leave an additional strong pheromone trail recording actual game result.
have the ant colony play continually play against itself with a very short timeout
decay old pheromones as a moving average

Cached Data

Much of the MontyCarlo runtime is spent expanding the tree and computing simulations/heuristics. Some of this could be precomputed to extend the depth of search possible during the 8 second turn timer.

The state tree needed to be persisted to disk, in a pickle.zip.base64 format suitable for embedding as text in a python script, then reloaded upon initialization.

Several hours of localhost runtime where spent playing the MontyCarlo agents against themselves and other agents including hundreds of matches against random_agent. This effectively generated an opening book and precached engdame values for some expected lines of play.

In theory, kaggle allows 100Mb of submission data, but in practice data files larger than 5-10Mb cause kaggle submission errors. The original datafile generated by the above process was 47Mb which was too large. By pruning the tree of any nodes that had not been fully expanded, the filesize was reduced to a workable 5Mb.

This cached datafile significantly improved winrates on the leaderboard, both for Random Simulation and Heuristic versions of Monty Carlo Tree Search.

Heuristics

Liberties Heuristic

Leaderboard Score: 1040 (top 8%) | AlphaBetaAgent + LibertiesHeuristic

heuristics/LibertiesHeuristic.py

This was a first attempt at writing a heuristic function for AlphaBeta.

It iterates over the cells of board, detecting lines still capable of making a connect 4 and
then counting the number of liberties/extensions available for each line using the formula:

return np.sum([ len(extension)**1.25 for extension in self.extensions ]) / ( self.game.inarow**2 )

This doesn't take into account the odd/even height of available liberties. The implementation is not perfectly efficient and can't be Numba'ed easily.

Occasionally the heuristic does seem to miss obvious blocking moves against an opponent who is able to make Connect4 on the next move.

The profiler reports the heuristic accounts for 93% of runtime which is overly expensive and AlphaBeta is usually only able to reach depth 5 using this heuristic.

Even still AlphaBeta + LibertiesHeuristic does score in the top 8% Kaggle Leaderboard

Bitboard Gameovers Heuristic

Leaderboard Scores:

1110 (top 4%) | MontyCarloHeuristic + BitboardGameoversHeuristic + Cached Data
1080 (top 5%) | AlphaBetaAgentBitboard + BitboardGameoversHeuristic
1070 (top 5%) | MontyCarloHeuristic + BitboardGameoversHeuristic

Code:

heuristics/BitboardHeuristic.py

This was a first attempt to write a performance optimized heuristic using vectorized bitshifting

The heuristic, as the name suggests, is based on the the gameovers() function which has a precomputed array of bitmasks for all possible connect 4 locations, then tests to see if the player pieces have filled in all those bits.

The heuristic presents a simplified problem by counting the number of possible connect 4 lines remaining by including empty squares when comparing against the bitmask. A bitmask containing only a single bit can be quickly detected using
log2() % 1 == 0.

Lines with multiple spaces on either end are rewarded by double counting, as three tokens in a row with spaces on one end will also match "2 in a row + 2 spaces" and "1 in a row + 3 spaces".

Lines containing only a single player piece are discounted with a score of 0.1, compared to 1 for a multibit mask of player piece.

An additional double_attack_score of 0.5 is given if any two gameover bitmasks happen to overlap by a single square. This does not offer true double attack detection, but it rewards designing intersecting lines that are somewhat correlated with double attacks. In practice it improved the winrate vs AlphaBetaAgent with LibertiesHeuristic from 50% to near 100%.

The heuristic returns the difference between the players bitmask count and the opponents bitmask count.

Having discovered that Monty Carlo Tree search works well with a heuristic, and in preperation for the idea of a AlphaZero style neural network based heuristic, an attempt was m

Bitsquares Heuristic

Leaderboard Scores:

Code:

heuristics/BitsquaresHeuristic.py

This takes a similar approach to Bitboard Gameovers Heuristic, in that it checks for every possible connect 4 bitmask, containing only player owned or empty squares that is not blocked by opponent pieces.

The score is based counting the number of bits in each line and squaring the bitcount.

Hyperparameter tuning discovered that using a power of 1.75 rather than 2 improved the winrate against bitboard_gameovers_heuristic() from 48% to 94%, without using any double_attack_score logic.

Optimal Reward Values given: n ** 1.75:

1-in-a-row = 1
2-in-a-row = 3.4
3-in-a-row = 6.8
4-in-a-row = inf

OddEven Heuristic

Leaderboard Scores:

Code:

This heuristic rewards three additional conditions based on he even/odd height of empty squares relative to player turn and the endgame would play out if there was a series of forced moves in the final column that would result in an endgame win/loss/draw.

Alone the heuristic only scores a 5% winrate vs bitsquares_heuristic(), but the two can be added together as a combined heuristic for improved results over baseline.

The rewards are for:

reward_3_pair=0 - A pair of 3-in-a-rows, with each having an empty bit at an even or odd height from the current floor. This leads to a chance of double attack, but did not seem to provide a winrate advantage in practice.
reward_3_endgame=1 - A 3 in a row with an empty space that would be playable if found in the last endgame column. The was the strongest rule, that produced a winrate of 70-80% against bitsquares_heuristic() alone, but higher or lower values for the hyperparameter could produce winrates as low as 30%.
reward_2_endgame=0.5 | 0.05 - This is similar to reward_3_endgame but checks for 2-in-a-row lines with both spaces correctly alighned for the endgame. Unit tests where written to demonstrate detection of starting positions that would eventually result in winning/drawn/lost endgames. Used alone at a value of 0.5, it could produce 70% winrates against bitsquares_heuristic() alone, but when combined with reward_3_endgame=1 the winrate would drop to 40% unless set to a very low number.

This advanced heuristic is competing against its own performance cost. bitsquares_heuristic() runs in 0.1ms, but _oddeven_heuristic() takes an additional 0.25ms, even after avoiding recomputing unnecessary data. If set to return a 0 score, the additional computational cost of _oddeven_heuristic() alone sets the baseline winrate to 35% vs bitsquares_heuristic().

Numba

An attempt was made refactor using numba @njit for performance optimization. This probably would have been eaiser to do had the code been written designed with numba in mind, as @njit requires forceobj=True when accessing object oriented instances, which removes most of the performance gains and can even make things slower.

Only a small number of functions in LibertiesHeuristic.py where able to be optimized this way, and still work upon kaggle submit, resulting in maybe a 10% performance improvement in benchmarks.

Numba Notes:

numba doesn't like object oriented programming or frozenset()
@jitclass doesn't like class properties or @cached_property
numba sometimes requires rewriting code such as list comprehensions explicitly
cache=True doesn't work for Kaggle submit due to a read-only filesystem.
@numba.generated_jit(nopython=True, nogil=True, cache=False, forceobj=False, parallel=True) has been shown to work on kaggle submit, which triggers a compile at startup rather than runtime

Memoization in numba may be possible by using a int64 memory address pointer

Work in progress: create testcases for which @njit functionality works inside kaggle submit

https://www.kaggle.com/jamesmcguigan/connectx-numba-testcases

Tests

Unit Tests

Unit tests validate that individual functions work as expected.

Integration Tests

Integration testing can be applied to AI algorithms by giving them game puzzles to solve, especially in positions where a human can verify that there is only one (or two) winning/losing moves.

The simplest is being one move away from connect 4 and seeing if the agent can either find the winning move, or block the opponent from that square. More complicated positions include being able to spot a double attack, which requires a Minimax search depth of 4, or knowing which column to play during an endgame.

A second form of integration tests is a live matchup against the inbuilt kaggle agents: random and negamax. Any leaderboard worthy agent should be able score a near 100% winrate against these opponents, so a logic mistake in the algorithm will show up as a test failure.

Future Ideas

Create an opening book for Minimax
Create an endgames table
Deep learning / Decision Tree / XGBoost
- As a heuristic to predict winning positions
- As an agent to predict the next move
- In a GAN style league to train against other Agents
Add table of contents to README using asciidoc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

connectx

connectx

README.md

Connect X / Connect 4

Random Agent

Rules Agent

Minimax, Negamax and AlphaBeta

AlphaBeta Agent

Timeouts

AlphaBeta Bitboard

Negamax Bitboard

Minimax Bitboard

Monty Carlo Tree Search

Pure Monty Carlo Search

Monty Carlo Heuristic Search

Ant Colony Heuristic Search

Cached Data

Heuristics

Liberties Heuristic

Bitboard Gameovers Heuristic

Bitsquares Heuristic

OddEven Heuristic

Numba

Tests

Unit Tests

Integration Tests

Future Ideas

Files

connectx

Directory actions

More options

Directory actions

More options

Latest commit

History

connectx

Folders and files

parent directory

README.md

Connect X / Connect 4

Random Agent

Rules Agent

Minimax, Negamax and AlphaBeta

AlphaBeta Agent

Timeouts

AlphaBeta Bitboard

Negamax Bitboard

Minimax Bitboard

Monty Carlo Tree Search

Pure Monty Carlo Search

Monty Carlo Heuristic Search

Ant Colony Heuristic Search

Cached Data

Heuristics

Liberties Heuristic

Bitboard Gameovers Heuristic

Bitsquares Heuristic

OddEven Heuristic

Numba

Tests

Unit Tests

Integration Tests

Future Ideas