Skip to content

Applying the ELO system to predict March Madness games

Notifications You must be signed in to change notification settings

NickHilton/March-Madness

Repository files navigation

March Madness

This project is an attempt to model the infamously unpredictable march madness tournament

Background

March Madness are two annual end of season college basketball tournament, used to determine the national champions for both Men and Women.

Every year millions around the country fill out brackets to predict the winners of the tournament, everyone employing different ways of picking winners, from form guides and expert opinions to which mascot you prefer.

The vast nature of data collected during sports games leaves the tournament perfectly poised to apply some data science and predict results. Kaggle runs a competition (mens, womens) for data scientists to predict the results of the tournament.

The evaluation used to determine competition winners is the sum of the Brier loss of predictions vs results.

For each possible match, you provide a probability of Team A beating Team B, and the score after the result is the brier loss from the result [0,1] vs. the prediction. This penalises confident guesses being wrong more harshly

This project implements an elo type system most famously used for chess world rankings, to give team's ratings over time and predict tournament results.

It uses SQLAlchemy and sqlite3 to handle the databases - data is downloaded from Kaggle's March Madness competition website, credit to them for the datasets

Getting Started

  1. To get started with the project, clone from github.

  2. Download up to date data from the most recent Kaggle competition making sure to put men's data in the data_male folder and womens in the data_female folder

  3. Install requirements

pip install -r requirements/base.txt
  1. Load data into databases
# Initialise database
sqlite3 DATABASE

- SET env var DATABASE_URL

# THis will load all initial data and migrate databases to the initial state using SQLAlchemy
python database_scripts/load_all.py
python database_scripts/load_seeds.py
python database_scripts/load_matches_teams_map.py

# Now upgrade database again
alembic upgrade head

Now run the sql in `score_diff.sql` on your database

# Finally run the last python script
python database_scripts/update_all_match_stats.py

Building Models

Testing Parameters to the ELO Model

To run tests for a grid of parameters, update the parameters you want to test in elo_run/param_tuning.py and then run python elo_run/param_tuning.py

This will run tests for params specified and save results to the evaluations table in the db

Evaluating param sets

The notebook notebooks/Evaluate systems.ipynb allows you to evaluate different sets of params and pick a set for your final model

Creating submission files for the kaggle competition

The notebook notebooks/Season-predictions-Updated-Ratings.ipynb allows you to get a submission file for the Kaggle competition It uses final ELO ratings and then updates ratings probabilistically based on your predictions for each round before predicting the next round

Further Reading

For an adaptation of the ELO system to the Premier League see this paper which explores the topic in more detail

About

Applying the ELO system to predict March Madness games

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •