Data Science Template

This repository contains a jupyter notebook that serves as a template for my data science projects. Feel free to use it for yours as well.

You can find the notebook in this repository as template.ipynb.

Setup
- Installation of required modules (think about using a requirements.txt file)
- Importing the necessary modules
- Setup of various settings that will be used throughout the project. Some examples:
  - Configure the figure_format
  - Set up logging if needed
  - Set seaborn / matplotlib themes
  - Set pandas max columns options

Data collection
- Loading the dataset
- First exploration
  - Head command to see the columns and data
  - Describe command to see the ranges of numerical data
  - Info command as a first quick null check
- Data Cleaning
  - Transforming data types
  - Handling null values appropriately
  - Merging tables of data to use in EDA

EDA: Exploratory Data Analysis
- Univariate exploration
- Multivariate exploration
- Correlations

Statistical Analysis
- Repeat for every hypothesis:
  - Describe the target populations
  - Describe the null and alternative hypothesis
  - Set the significance level
  - Describe assumptions
  - Describe choice of test
  - Describe the results

Machine Learning
- Define one or more prediction goals (repeat next steps for every goal)
  - Load the input data that you need
  - Data preprocessing
    - Address multicollinearity if strong correlations were found during the EDA
    - Think about using dimensionality reduction
    - Label / one-hot encoding
    - Standard scaling
    - Normalization
    - Train - test splitting
  - Model selection and training
    - Explain what model you'll be using
    - Hyperparameter tuning
    - Model training
  - Model evaluation
    - Evaluate model using the metrics of choice.

Summary
- Provide an overview of the entire project with key takeaways

Improvements
- List the possible improvements that you see

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
template.ipynb		template.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science Template

Table of Contents :

Introduction

Setup

Data collection

EDA: Exploratory Data Analysis

Statistical Analysis

Machine Learning

Summary

Improvements

About

Releases

Packages

Languages

License

kkalera/DataScienceTemplate

Folders and files

Latest commit

History

Repository files navigation

Data Science Template

Table of Contents :

Introduction

Setup

Data collection

EDA: Exploratory Data Analysis

Statistical Analysis

Machine Learning

Summary

Improvements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages