This project has two main components: a web scraper, and a database for the scraped data.
The web scraper is for NBA statistics on games, players, standings, and teams. This project uses information available on Basketball Reference. I wrote this module to learn BeautifulSoup, and use it for analytics to annoy friends with stats nobody asked for.
The database uses the stores the scraped data. There are currently 5 tables: players, players.perGame, teams, teams.perGame, standings.
This section describes the methods supported by this project.
Clone the repository:
$ git clone https://github.com/NBAStatScraper.git
Install the requirements:
$ pip install -r requirements.txt
Parameters:
name
: The first and last name of a player (e.g.'Joel Embiid'
)
Returns:
A Pandas dataframe containing the player's stat projections for the current season.
Parameters:
name
: The first and last name of a player (e.g.OG Anunoby
)per
: The method in which the statistics are calculated. Can be any one of:
['game', 'total', 'min', 'pos', 'shooting', 'playoffTotal', 'playoffGame', 'playoffMin', 'playoffPos', 'playoffShooting', 'careerHighs', 'playoffCareerHighs', 'college', 'salary', 'contract']
Returns:
A Pandas dataframe containing the player stats.
Parameters:
team
: The 3 letter abbreviation of an NBA team (e.g.'TOR'
,'POR'
)year
: The year for the roster to get (e.g. 2020)
Returns:
A Pandas dataframe containing the roster and player information for the year.
Parameters:
team
: The 3 letter abbreviation of an NBA team (e.g.'TOR'
,'POR'
)year
: The year for the roster to get (e.g. 2020)per
: The method in which the statistics are calculated. Can be any one of:
['game', 'total', 'min', 'pos', 'shooting', 'playoffTotal', 'playoffGame', 'playoffMin', 'playoffPos', 'playoffShooting']
Returns:
A Pandas dataframe containing the team stats for a chosen year.
Parameters:
team
: The 3 letter abbreviation of a team (e.g.'TOR'
,'BOS'
)year
: The year of the season to get game scores for (e.g.'2019'
,'2002'
)
Returns:
A Pandas dataframe containing the information of each game for the team in the given year.
Parameters:
team
: The 3 letter abbreviation of a team (e.g.'TOR'
,'BOS'
)year
: The year of the season to get playoff scores for (e.g.'2019'
,'2002'
)
Returns:
A Pandas dataframe containing the information of each playoff game for a team in a given year.
These following two are currently broken. I suspect it is an issue with the WSL pyppeteer port.
Parameters:
home
: The 3 letter abbreviation of the home team (e.g.'TOR'
,'BOS'
)away
: The 3 letter abbreviation of the away team (e.g.'TOR'
,'BOS'
)date
: The date of the game (e.g.2020-12-23
)
Returns:
A Pandas dataframe containing the locations and information of each shot taken in the game.
Parameters:
home
: The 3 letter abbreviation of the home team (e.g.'TOR'
,'BOS'
)team
: The 3 letter abbreviation of the team to get the shooting stats for (e.g.'TOR'
,'BOS'
)date
: The date of the game (e.g.2018-11-20
)
Returns:
A Pandas dataframe containing the team's shooting information for the game.
Parameters:
conference
: The conference to get the standings for. Can be one of['E', 'W']
year
: The year to get the conference standings for (e.g. '2019`)
Returns:
A Pandas dataframe containing the conference standings.
Parameters:
year
: The year to get the league standings for (e.g. '2000')
Returns:
A Pandas dataframe containing the league standings for a year.
Parameters:
year
: The year to get the team vs. team records (e.g. '2018')
Returns:
- A Pandas dataframe containing the records of each team against every other team.