Welcome to dreamTeam repository

About

This is a Mini-Project for SC1015 (Introduction to Data Science and Artificial Intelligence) which focuses on datas from NBA.com.

Check out our Project Video and slide deck too

For detailed walkthrough, please view the source code in order from:

Problem Definition

README.md

Data Collection and Preperation

WebScraping.ipynb
Features.md
FeatureSelection.ipynb
The notebooks in this portion is used to extract data from NBA.com to form our datasets. For data extraction, we utilised selenium and beautifulSoup to scrape the data from NBA.com. After extracting the data, the feature selection notebook sees us plot correlation matrixes and graphs to identify features that have high correlation to drop. As these features have high correlation, dropping one of them would help us increase the accuracy of our model. We also perform various data cleaning tasks such as checking for null values and renaming columns to be more descriptive. We also held onto various features despite high correlations in order to not adversely impact the interpretability of our eventual model.

Exploratory Data Analysis / Visualisation

FeatureSelection.ipynb
PCA.ipynb
EDA.ipynb
The FeatureSelection Notebook also serves as a form of data analysis and visualisation as we see the relationship between the various stats and recognise what stats may be useful in categorising the play styles of players. After dropping some features in FeatureSelection, we carried out Principal Component Analysis (PCA) to reduce the dimensions of the data further, in particular we limited the number of principal components to 5, despite only achieving a 53% explained variance in order to ensure our components were interpretable. After doing the PCA, we performed futher exploratory analysis using our chosen principal components and gleaned additional insights into how our data may be clustered (found in EDA.ipynb).

Use of Machine Learning

hierarchicalClustering.ipynb
K-MeansClustering.ipynb
After carrying out exploratory data analysis/ visualisation, we went on to use machine learning models specifically clustering to derive a solutions to our problem. We utilsied Hierarchical Agglomerative Clustering and K-Means Clustering algorithm to obtain different clusters of players.

Final Insights

clusteringEDA.ipynb
Lastly, in this notebook we utilised the clusters that we obtained and applied it to the win losses record of the NBA teams in 2020-2021 season. We managed to obtain which clusters of players did top teams have as compared to the weaker teams. We also added some concluding thoughts on the problem, solution and models used.

Contributors

@CSA100 (Anand Chaanan Singh) - Exploratory Data Analysis/ Visualization
@zonpig (Bryan Lim Kai Wen) - Data Collection and Preparation
@crustyapples (Advait Bharat Deshpande) - Use of Machine Learning

Background

With the evolution of the sport, players have greater access to training facilities and have been able to build stronger capabilities to improve their play.

This has made it increasingly difficult for managers to predict the outcomes for the teams that they draft. They are no longer able to predict their teams’ success based on traditional methods, such as their height, strength, shooting abilities etc.

The most successful teams are no longer a combination of the best Point Guards, Shooting Guards, Small Forwards, Power Forwards and Centres. More unorthodox groups can achieve great success too.

Problem Definition:

Hence, we ask:

How might we provide more insightful recommendations on team composition to todays managers?

To address this question, we set out to uncover the kind of players that todays high-performing basketball teams are composed of, by using machine learning techniques to reveal the true meta of NBA

Models Used

Hierarchical Agglomerative Clustering
K-Means Clustering

Conclusion

Challenges:

Do not know the true underlying clusters to compare against -Limited practical significance derived from statistical metrics such as Silhouette Coefficient

Insights:

Directly evaluating results in NBA's context. Models were able to generate useful and logical recommendations.
Data from NBA.com seems to contain significantly more offensive playstyle statistics that may have limited the model's ability to identify strong defensive clusters.
K-Means gave less insight on the differences between top and bottom teams
Top teams had a composition that was in fact distinct from other teams

What did we learn from this project?

Using Selenium and BeautifulSoup for web scrapping
Hierarchical Agglomerative Clustering and K-Means Clustering
Collaborating using GitHub

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to dreamTeam repository

About

For detailed walkthrough, please view the source code in order from:

Contributors

Background

Problem Definition:

Models Used

Conclusion

What did we learn from this project?

References

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 171 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Data		Data
.DS_Store		.DS_Store
BCF3_Group1_DeshpandeLimSingh.pptx		BCF3_Group1_DeshpandeLimSingh.pptx
EDA.ipynb		EDA.ipynb
FeatureSelection.ipynb		FeatureSelection.ipynb
Features.md		Features.md
K-MeansClustering.ipynb		K-MeansClustering.ipynb
PCA.ipynb		PCA.ipynb
README.md		README.md
Web Scrapping.ipynb		Web Scrapping.ipynb
clusteringEDA.ipynb		clusteringEDA.ipynb
hierarchicalClustering.ipynb		hierarchicalClustering.ipynb

zonpig/SC1015-DSAI-nbaDreamTeam

Folders and files

Latest commit

History

Repository files navigation

Welcome to dreamTeam repository

About

For detailed walkthrough, please view the source code in order from:

Contributors

Background

Problem Definition:

Models Used

Conclusion

What did we learn from this project?

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages