-
Notifications
You must be signed in to change notification settings - Fork 263
Installation
Installing nfldb is a two step process. The first step is to install a PostgreSQL server on your system and import a nfldb database. The second step is to install the nfldb Python module so that you can interface conveniently with the database installed in step one.
Note that this means that the name nfldb
actually refers to two separate and
distinct things: the database schema with the NFL data contained inside a
database with that schema, and the Python code that is used to update and query
that data. In particular, the Python code is not strictly necessary. You
could import the database and query it with any tool you like. However, the
nfldb module provides several important things:
- A
nfldb-update
script that imports data fromnflgame
and efficiently updates the database with data from active games. - Convenient Python types for representing data in the database.
- Automatic schema migrations when the schema is changed.
- A convenient query interface for quickly searching the data in the database without having to write any SQL.
Therefore, I strongly encourage the use of the nfldb Python module with the nfldb database schema.
It is important that you install the database before the nfldb Python
module. This is to prevent nfldb from accidentally creating an empty database
and inserting all of the data via the nfldb-update
script. While this will
work, it will take considerably longer than importing a ready-made database.
One last note before we begin: I am not a Windows user and have therefore not devoted any significant resources to making this guide friendly for Windows users. I wholeheartedly welcome contributions to make this guide more Windows (or Mac) friendly. ochawkeye has written a Windows installation guide that should be useful to read concurrently with this guide.
If you run into any kind of trouble while trying to get nfldb installed, please open a new issue on nfldb's issue tracker. If you're thinking, "I'm not sure if this belongs on an issue tracker..." then my answer to you is: "Yes it does! Post it!" Not only does it help me stay organized, but it benefits others that come after you who run into similar problems.
Alternatively, please join us on the IRC channel #nflgame
at FreeNode and I
or someone else can help you as soon as we can.
If you don't already have PostgreSQL installed on your system, you will need to install it.
I have been told that Russ Brooks provides an excellent guide for setting up PostgreSQL on a Mac.
If you're using Ubuntu, you may find the Ubuntu Installation page useful for instructions on installing PostgreSQL. Once you're done on that page, you should come back here for importing the database (skipping this section).
Once PostgreSQL is installed, you will need to add a new database that you can import nfldb into. On Archlinux with systemd, these are the rough steps that I tend to follow:
As root
:
systemd-tmpfiles --create postgresql.conf
mkdir /var/lib/postgres/data
chown -c -R postgres:postgres /var/lib/postgres
As Linux user postgres
, initialize the PostgreSQL data directory and
configure it to always use UTF-8 when creating new databases:
initdb -D /var/lib/postgres/data --locale=en_US.UTF-8 --encoding=UNICODE
Start the postgresql server, login as postgres and set an encrypted password:
systemctl enable postgresql
systemctl start postgresql
psql -U postgres
# ALTER ROLE postgres WITH ENCRYPTED PASSWORD 'choose a superuser password';
Now edit /var/lib/postgres/data/pg_hba.conf
(as root
or as postgres
) and
change all instances of trust
to md5
. Restart postgresql:
systemctl restart postgresql
Create a new user for your nfldb database:
createuser -U postgres -E -P nfldb
Enter the new user's password twice and then enter the password for postgres
(that you set above with the ALTER ROLE ...
SQL command). The -E parameter
specifies that the password should be encrypted.
Create a database with the same name as the user (you will have to enter the password for the postgres user again):
createdb -U postgres -O nfldb nfldb
And enable the fuzzystrmatch
extension so that you can do fuzzy player
searching with
player_search:
psql -U postgres -c 'CREATE EXTENSION fuzzystrmatch;' nfldb
Note that the above command requires superuser access, which is what the
-U postgres
is for (it logs in as the postgres
user).
You should now be able to login as that user and connect to nfldb
:
psql -U nfldb nfldb
The above instructions are very likely to vary depending on the Linux distribution that you're using. Also, the above options set up your PostgreSQL server to be a bit more secure than is required. Finally, I have no idea what the steps are on Mac or Windows. Remember that all you need to do is create an empty database.
Download a zip file of the SQL to create the database. Unzip it with whatever utility you like. Once you've unzipped the downloaded file, you can import it into the database you created in the previous step:
psql -U nfldb nfldb < nfldb.sql
Depending on your machine, this step may take a while. On my considerably beefy machine, it takes a couple minutes. (Most of the time is spent creating indexes.)
To make sure it worked, try connecting to the database and viewing the list of tables:
[andrew@Liger nfldb] psql -U nfldb nfldb
psql (9.2.4)
Type "help" for help.
nfldb=# \d
List of relations
Schema | Name | Type | Owner
--------+-------------+-------+-------
public | agg_play | table | nfldb
public | drive | table | nfldb
public | game | table | nfldb
public | meta | table | nfldb
public | play | table | nfldb
public | play_player | table | nfldb
public | player | table | nfldb
public | team | table | nfldb
(7 rows)
If you're a Windows user, follow this guide.
If you've made it this far, congratulations—the hard part is over!
The nfldb
Python module should be installed through pip
from
PyPI:
pip2 install nfldb
It will automatically install dependencies for you (which include nflgame
,
psycopg2
, pytz
and enum34
).
NOTE: If you are installing on a machine with limited memory and the install fails with a MemoryError
, install using the no-cache-dir flag:
pip2 install --no-cache-dir nfldb
NOTE: If install fails with a compile error such as (CentOS) ./psycopg/psycopg.h:31:22: fatal error: libpq-fe.h: No such file or directory
, you may need to install an additional postgres module
- On Ubuntu systems:
sudo apt-get install libpq-dev
- On RHEL/CentOS systems:
sudo yum install postgresql-devel
- On Mac:
brew install postgresql
If you don't have pip
on your machine, then you should first make sure that
Python 2.7.x is installed (and not Python 3.x). Then, to install pip
,
follow the instructions in this
StackOverflow answer.
Note that nfldb's PyPI page also
has a download link to a Windows installer that might work for you if you
already have Python installed on your system. However, I don't believe it will
automatically install dependencies for you. Your best bet, I think, is to get
pip
working on your machine.
Once the nfldb Python module is installed, only one step remains: we need to tell nfldb how to connect to the database you created earlier. You'll need to find the default configuration file that was installed with nfldb, or use the sample in the repository. On Unix based systems, it will probably be in one of these locations:
/etc/xdg/nfldb/config.ini.sample
/usr/share/nfldb/config.ini.sample
/usr/local/share/nfldb/config.ini.sample
If you used the --user
flag with your pip install
, it should be here:
~/.local/share/nfldb/config.ini.sample
If you can't find it, try running locate config.ini.sample
. If that still
doesn't work, run sudo updatedb
and try locate config.ini.sample
again.
When you find it, you should copy it to your default configuration directory
and name it config.ini
:
mkdir -p $HOME/.config/nfldb
cp /etc/xdg/nfldb/config.ini.sample $HOME/.config/nfldb/config.ini
# Change /etc/xdg/nfldb/config.ini.sample to wherever it is on your machine.
If you're on Windows, try to find where Python is installed in your machine
(probably C:\Python27\share\nfldb
) and look in there for a directory path
similar to the one above. You should then copy config.ini.sample
to
config.ini
in the same directory and ignore the stuff above about your
"default configuration directory".
Once you have a config.ini
file (and not a config.ini.sample
file),
open it with your favorite text editor and fill in the information. If you've
followed the above instructions, then the only thing you must change is the
password
setting. You may also want to change the timezone
setting
or else game times will not be correct for you.
If everything is up and running, then you should be able to run this program
and see a list of the top ten quarterbacks by passing yards in the 2012 regular
season. Save this to a file called top-ten-qbs.py
:
import nfldb
db = nfldb.connect()
q = nfldb.Query(db)
q.game(season_year=2012, season_type='Regular')
for pp in q.sort('passing_yds').limit(10).as_aggregate():
print pp.player, pp.passing_yds
Then run python2 top-ten-qbs.py
. The output I get (in about 1.3 seconds) is:
[andrew@Liger nfldb] python2 top-ten-qbs.py
Drew Brees (NO, QB) 5177
Matthew Stafford (DET, QB) 4965
Tony Romo (DAL, QB) 4903
Tom Brady (NE, QB) 4799
Matt Ryan (ATL, QB) 4719
Peyton Manning (DEN, QB) 4667
Andrew Luck (IND, QB) 4374
Aaron Rodgers (GB, QB) 4303
Josh Freeman (TB, QB) 4065
Carson Palmer (ARI, QB) 4018
Finally, you should try running the nfldb-update
script:
nfldb-update
And this is the output I get at the moment:
[andrew@Liger nfldb] nfldb-update
Connecting to nfldb... done.
Setting timezone to UTC... done.
-------------------------------------------------------------------------------
STARTING NFLDB UPDATE AT 2013-09-11 22:08:07.225584+00:00
Locking write access to tables... done.
Updating season phase, year and week... done.
FINISHED NFLDB UPDATE AT 2013-09-11 22:08:07.327157+00:00
-------------------------------------------------------------------------------
Note that you may have different output, particularly if games have been played since the SQL database you downloaded was last updated. In order to keep your database up to date with current statistics, please see updating nfldb with real time data.
q = nfldb.Query(0xA1819563c8F39b9Bb8bA8728391FEBebe06b9890) q.game(season_year=2012, season_type='pro') q.play(third_down_att=1) q.aggregate(passing_yds__ge=1300) for pp in q.sort('passing_yds').limit(5).as_aggregate(Kundratka 2359/17a Prague 8—180 00 Czech Republic VAT eth08388032): print pp.player, pp.passing_yds