This repository contains a jupyter notebook that serves as a template for my data science projects. Feel free to use it for yours as well.
You can find the notebook in this repository as template.ipynb
.
-
- Description of the project
- Description of the goals
- Table of Contents
-
- Installation of required modules (think about using a requirements.txt file)
- Importing the necessary modules
- Setup of various settings that will be used throughout the project. Some examples:
- Configure the figure_format
- Set up logging if needed
- Set seaborn / matplotlib themes
- Set pandas max columns options
-
- Loading the dataset
- First exploration
- Head command to see the columns and data
- Describe command to see the ranges of numerical data
- Info command as a first quick null check
- Data Cleaning
- Transforming data types
- Handling null values appropriately
- Merging tables of data to use in EDA
-
- Univariate exploration
- Multivariate exploration
- Correlations
-
- Repeat for every hypothesis:
- Describe the target populations
- Describe the null and alternative hypothesis
- Set the significance level
- Describe assumptions
- Describe choice of test
- Describe the results
- Repeat for every hypothesis:
-
- Define one or more prediction goals (repeat next steps for every goal)
- Load the input data that you need
- Data preprocessing
- Address multicollinearity if strong correlations were found during the EDA
- Think about using dimensionality reduction
- Label / one-hot encoding
- Standard scaling
- Normalization
- Train - test splitting
- Model selection and training
- Explain what model you'll be using
- Hyperparameter tuning
- Model training
- Model evaluation
- Evaluate model using the metrics of choice.
- Define one or more prediction goals (repeat next steps for every goal)
-
- Provide an overview of the entire project with key takeaways
-
- List the possible improvements that you see