GitHub

Background

College Basketball is notoriously hard to predict. There's a reason why the annual tournament is most widely known as March Madness.

Yet there is still tremendous interest in predicting the outcome of games. From sports betting to office pools, many avenues for people to make their own predictions of the outcome.

Goal

While there are many models out there, I would also like to try to build my own version to predict basketball outcomes. Specifically, I will develop a model to predict the point differential (or spread) for college basketball game.

Methodology

Data Data was collected from basketball-reference.com.
- Through webscraping, pulled season gamelogs for each school for the years 2014-2019
Data cleaning and feature engineering Data needed to be cleaned before use
- Data manipulation to obtain other needed stats, such as:
  - Spread between teams for each game
  - Running totals of team's history to have snapshot of prior history for each game played
  - aligning both teams' information for each single game
  - removing duplicate games, so only one game shows up
Modeling
- Linear Regression was used to develop the model
- See presentation for a summary of results, as well as the appendix for a review of LR assumptions

Deliverables

Technologies Used

Jupyter Notebook
Python
Libraries
- Pandas
- Numpy
- Matplotlib
- Seaborn
- Beautiful Soup

Future Work

Cross Validation with various seasons
SOS Adjusted Stats
- Don't need to include a SOS if I can create SOS-adjusted stats that go into the model...
  
  https://medium.com/analyzing-ncaa-college-basketball-with-gcp/fitting-it-in-adjusting-team-metrics-for-schedule-strength-4e8239be0530
  
  This would require, adding in Team and Opponent as one hot variables... running a linear regression after every single day to generate the adjusted metrics to be used for the next day... This is because to truly account for this in a predictive model, would need to create SOS adjusted stats for every stat and done so for historical data prior to each game
Add recency bias
- Calculate stats using more recent games (maybe restricted to last 10 or 5 or some kind of weighted average)
Refine Advanced Stats and % stats
- Used simple averages to calculate the running stats for the advanced stats and % stats. This could be further refined and more technically accurate by recalculating these numbers based on actual formulas.
- Most obvious for % calculations. (For example, if one game has perfect free throw percentage on only 10 free throws, but next game goes 0/20. Then true FT% is 10/30 or 30%, whereas i have calculated 50% because it is average of 0% and 100%.)
- Leaving this for now but can come back to update this later.
Compare my predictions to the line from other models
- http://www.thepredictiontracker.com/
- I have not quite reconciled how to match up games with the data pulled from predictiontracker, issue is with Neutral games and how they are designated as home, and then matching it up correctly
Create interaction of pace and ORtg/DRtg
More interactions - difference between the teams playing?
Time Series effect?
Run a correlation matrix on the variables to group them so that just one can be picked
PCA for understanding like features
Distance to Court (proxy for fan support/away games being more difficult)
run model again without standardized variables so that coefficients are more interpretable

Future file cleanup

organize the data files
add more detail on the analysis

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.DS_Store		.DS_Store
Data_Cleaning.ipynb		Data_Cleaning.ipynb
Data_Cleaning_ScratchWork.ipynb		Data_Cleaning_ScratchWork.ipynb
Data_Collection.ipynb		Data_Collection.ipynb
Data_Collection_with_Scratchwork.ipynb		Data_Collection_with_Scratchwork.ipynb
Glossary.md		Glossary.md
NCAA_MBB_final.pdf		NCAA_MBB_final.pdf
NCAA_MBB_presentation.pdf		NCAA_MBB_presentation.pdf
README.md		README.md
Regression.ipynb		Regression.ipynb
advanced_gamelog.pickle		advanced_gamelog.pickle
advanced_gamelog_2014.pickle		advanced_gamelog_2014.pickle
advanced_gamelog_2015.pickle		advanced_gamelog_2015.pickle
advanced_gamelog_2016.pickle		advanced_gamelog_2016.pickle
advanced_gamelog_2017.pickle		advanced_gamelog_2017.pickle
advanced_gamelog_2018.pickle		advanced_gamelog_2018.pickle
advanced_gamelog_2019.pickle		advanced_gamelog_2019.pickle
advanced_stats.pkl		advanced_stats.pkl
basic_stats.pkl		basic_stats.pkl
clean_combo_19.pickle		clean_combo_19.pickle
clean_combo_2014.pickle		clean_combo_2014.pickle
clean_combo_2015.pickle		clean_combo_2015.pickle
clean_combo_2016.pickle		clean_combo_2016.pickle
clean_combo_2017.pickle		clean_combo_2017.pickle
clean_combo_2018.pickle		clean_combo_2018.pickle
clean_combo_2019.pickle		clean_combo_2019.pickle
combo_log.pickle		combo_log.pickle
combo_log_2014.pickle		combo_log_2014.pickle
combo_log_2015.pickle		combo_log_2015.pickle
combo_log_2016.pickle		combo_log_2016.pickle
combo_log_2017.pickle		combo_log_2017.pickle
combo_log_2018.pickle		combo_log_2018.pickle
combo_log_2019.pickle		combo_log_2019.pickle
final_19.pickle		final_19.pickle
final_2014.pickle		final_2014.pickle
final_2015.pickle		final_2015.pickle
final_2016.pickle		final_2016.pickle
final_2017.pickle		final_2017.pickle
final_2018.pickle		final_2018.pickle
final_2019.pickle		final_2019.pickle
final_df.pickle		final_df.pickle
gamelog.pickle		gamelog.pickle
gamelog_2014.pickle		gamelog_2014.pickle
gamelog_2015.pickle		gamelog_2015.pickle
gamelog_2016.pickle		gamelog_2016.pickle
gamelog_2017.pickle		gamelog_2017.pickle
gamelog_2018.pickle		gamelog_2018.pickle
gamelog_2019.pickle		gamelog_2019.pickle
games_played.png		games_played.png
line_school_dict.csv		line_school_dict.csv
line_school_dict.numbers		line_school_dict.numbers
lines.pickle		lines.pickle
ncaabb18.csv		ncaabb18.csv
team_lookup.pickle		team_lookup.pickle
team_lookup_2014.pickle		team_lookup_2014.pickle
team_lookup_2015.pickle		team_lookup_2015.pickle
team_lookup_2016.pickle		team_lookup_2016.pickle
team_lookup_2017.pickle		team_lookup_2017.pickle
team_lookup_2018.pickle		team_lookup_2018.pickle
team_lookup_2019.pickle		team_lookup_2019.pickle
teams_list.pickle		teams_list.pickle
teams_lookup.pickle		teams_lookup.pickle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Background

Goal

Methodology

Deliverables

Technologies Used

Future Work

Future file cleanup

About

Releases

Packages

Languages

mrallenchen/NCAAMBB

Folders and files

Latest commit

History

Repository files navigation

Background

Goal

Methodology

Deliverables

Technologies Used

Future Work

Future file cleanup

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages