GitHub - joaoDragado/dSc_roadmap: A conceptual roadmap to the available statistical approaches to managing & interpreting data

Data Science Roadmap

Embarking on a foolhardy journey into the world of statistical inference & predictive analytics.

Introduction

The repository contains an outline of the techniques available to an aspiring data scientist to manipulate & gleam information from data.

This repo was borne out of necessity ; the world of Statistics has seen tremendous growth, and has branched out into paths, that while sharing the same aim, are clearly divergent in their philosophy & approach; trying to intelligently organise these (and their supporting code bases) quickly becomes a job in itself.

The idea is that, having a clear conceptual roadmap to all approaches ,coupled with their implementations to real-life examples,will assist the practitioner from answering the following questions :

"What kind of dataset do I need to investigate my claim/hypothesis?"
"Which statistical method/tool/test should I use with this dataset?"
"How can I interpret the results of my tests?"
"Are my results significant? Is my analysis worthy?"

The directory tree of the repository tries to distinguish between these different approaches, which might aid sorting through the breath of available choices, and hopefully selecting the appropriate one(s).

Each concept is (ideally) demonstrated via a real-life example / short project.

Descriptive a.k.a. Exploratory Data Analysis

collect summary information : mean, st.dev., outliers, etc.
Inferential

the predictive aspect of statistics.
- Classical
  - Frequentist
  - Bayesian
- Machine_Learning
  - Supervised
    
    Labeled data are provided
    - Regression
      
      Aims to predict continuous valued output ~ trying to find answer within a range.
    - Classification
      
      Aims to predict discrete valued output ~ answer will be True/False, Red/Blue/Green, etc.
  - Unsupervised
    
    Our data are unlabebeled, the model needs to distinguish significant features within dataset.

Directory Tree

generated via the tree bash command.

.
├── Descriptive
├── Inferential
│   ├── Classical
│   │   ├── Bayesian
│   │   └── Frequentist
│   │       └── Common_Statistical_Tests.html
│   └── Machine_Learning
│       ├── Choosing_ML_Algorithm_RoadMap.png
│       ├── data
│       │   ├── baseball.csv
│       │   ├── boston.csv
│       │   ├── nba_test.csv
│       │   ├── nba_train.csv
│       │   ├── quality.csv
│       │   ├── stevens.csv
│       │   ├── wine.csv
│       │   └── wine_test.csv
│       ├── Supervised
│       │   ├── Classification
│       │   │   ├── Classification_via_ItSL.html
│       │   │   ├── Medical_Diagnosis_via_Logistic_Regression.ipynb
│       │   │   └── Supreme_Court_Opinions.ipynb
│       │   └── Regression
│       │       ├── Boston_House_Prices--Decision_Trees.ipynb
│       │       ├── Linear_Regression_via_ItSL.html
│       │       ├── NBA_Moneyball.ipynb
│       │       └── Predicting_Wine_prices.ipynb
│       └── Unsupervised
└── README.md

11 directories, 18 files

Dependencies - Libraries used

Python 3.6 (code should also work for Python 2.7)
Anaconda Python Data Science Distribution ; the easiest way to install & manage all scientific libraries used in this repo.

Reference Links

Hastie Trevor, Efron Bradley, "Computer Age Statistical Inference - Algorithms, Evidence, and Data Science", 2017, Cambridge University Press

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
Inferential		Inferential
projects		projects
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science Roadmap

Introduction

Table of Contents

Directory Tree

Dependencies - Libraries used

Reference Links

About

Releases

Packages

Languages

joaoDragado/dSc_roadmap

Folders and files

Latest commit

History

Repository files navigation

Data Science Roadmap

Introduction

Table of Contents

Directory Tree

Dependencies - Libraries used

Reference Links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages