-
Download the .zip files for the leagues you want in your database from this Google Drive link. (Last updated 9/Nov/2024)
-
Unzip those downloaded files to the
web_pages/
folder - this creates a folder containing the html of the fbref web page for every league match since the start of the 2017-2018 season for your chosen leagues. The structure of the folders must be./web_pages/<league>/<season>/file
-
If you have downloaded more than just the
Premier_League.zip
file, change thecompetitions
parameter in themain()
function ofmain.py
to, e.g.,main(competitions=["La_Liga", "Ligue_1", "Premier_League"])
-
Run
main.py
- this checks fbref for any newly played matches in your specified leagues, and if any are found, adds them to theweb_pages/
folder. It then parses these pages and adds them to themaster.db
database file.
You can then use a program like DB Browser to explore this data using SQL queries. An overview of the database structure can be found here.
Note that master.db
file in this repo contains data for all of the top 6 leagues, and the latest version of premier_league.db
(only containing Premier League data) is also present at the Google Drive link.