AI research environment for the game of Snake written in Python 2.7. Part of the OpenAI - Request For Research 2.0.
Check out corresponding Medium articles:
Slitherin - Solving the Classic Game of Snake🐍 with AI🤖 (Part 1: Domain Specific Solvers)
Slitherin - Solving the Classic Game of Snake🐍 with AI🤖 (Part 2: General Purpose Solvers)
Slitherin - Solving the Classic Game of Snake🐍 with AI🤖 (Part 3: Genetic Evolution)
- Clone the repo.
- Go to the project's root folder.
- Install required packages
pip install -r requirements.txt
. - Launch slitherin. I recommend starting with the help mode to see all available modes
python slitherin.py --help
.
- Snake has to move either forward, left or right.
- Snake dies when hits wall or itself.
- For every eaten fruit, snake's length increases by 1 and a new fruit is generated on a random unoccupied place.
All mode previews contain current score Mode Name (min/avg/max). All modes are benchmarked on a 12x12 grid.
Algorithms are using domain specific data like snake's position, direction, neighbors etc.
python slitherin.py --shortest_path_bfs
Generates the shortest path from the snake’s head to the fruit using BFS algorithm.
Optimal performance during early stages, but as the snake grows, its body creates an unavoidable obstacle for the leading head.
python slitherin.py --shortest_path_dfs
Generates the shortest path from the snake’s head to the fruit using DFS algorithm.
Performs worse than BFS due to the graph’s cyclicity.
python slitherin.py --longest_path
Firstly, generates the shortest path (BFS) between the snake’s head and the fruit. Then for each pair of points in the path, tries to extend the distance between them with available actions.
Snake dies when its body is on a generated path.
python slitherin.py --hamilton
Generates a longest path between the snake’s head and its tail.
In the vast majority of the cases, such path covers the whole environment creating Hamiltonian path, thus solving the game of snake with a perfect score.
Each Deep Neural Net mode has a same model structure of:
- input layer with 5 neurons [action_vector, left_neighbor, forward_neighbor, right_neighbor, angle_to_fruit]
- hidden layer with 125 neurons (ReLU 6 activation)
- output layer with 1 neuron (value for a given action_vector)
python slitherin.py --deep_neural_net
python slitherin.py --deep_neural_net_trainer
Training phase consists of performing random gameplays followed by the evaluation and backpropagation of performed actions and its results.
Rewards:
- 0.7 for eating the fruit
- 0.1 for moving towards the fruit
- -0.2 for moving away of the fruit
- -1.0 for dying
As expected, DNN solver performs well in the early stages. Snake goes straight to the fruit and doesn't go into cycles. However as it gets longer, it starts to have problems with going around itself. With the current model structure (data about only the nearest surroundings), a snake doesn't indicate any sense of 'the whole environment orientation and position'
python slitherin.py --deep_neural_net_monte_carlo
For each possible action, there is a DNN-driven gameplay generated. Gameplay with the highest score is chosen for an ultimate move.
Very slow and inefficient performance in the beginning, but favorable in the late stages. DNN-driven simulations allow the snake to choose relatively wise long-term moves.
Algorithms are not using any domain specific data.
python slitherin.py --human
Used for debug, development and fun:).
python slitherin.py --random
It's always good to start benchmarking against randomness (at least pseudo).
As expected, very low performance.
python slitherin.py --monte_carlo
For each move, performs a set of 1000 random run simulations. Then groups them by the initial action and finally picks the action that started gameplays with the highest average score.
Slow and weak performance.
python slitherin.py --deep_neural_net_genetic_evolution
python slitherin.py --deep_neural_net_genetic_evolution_trainer
Initial population starts with random weights. Then in the selection phase, the top 0.1 of the population gets picked to the uniform crossover stage. In the crossover phase, parents are paired using roulette selection (the highest the score, the highest the probability of breeding). Finally, in the mutation phase, 0.01 of the weights of all offsprings are being mutated to the random values. Then we start again with a new population created fully by the newly bred offsprings. Above cycle is being repeated until convergence which happens usually around 25th generation and the average score of 22.
Performance is relatively satisfactory. Snake correctly learned that taking the shortest path to the fruit isn't a good solution in the late stages, but ultimately still gets trapped within its own body.
Multiplayer (multi-agent) version of slitherin is currently being developed.
Stay tuned!
Greg (Grzegorz) Surma