Embarking on a foolhardy journey into the world of statistical inference & predictive analytics.
The repository contains an outline of the techniques available to an aspiring data scientist to manipulate & gleam information from data.
This repo was borne out of necessity ; the world of Statistics has seen tremendous growth, and has branched out into paths, that while sharing the same aim, are clearly divergent in their philosophy & approach; trying to intelligently organise these (and their supporting code bases) quickly becomes a job in itself.
The idea is that, having a clear conceptual roadmap to all approaches ,coupled with their implementations to real-life examples,will assist the practitioner from answering the following questions :
-
"What kind of dataset do I need to investigate my claim/hypothesis?"
-
"Which statistical method/tool/test should I use with this dataset?"
-
"How can I interpret the results of my tests?"
-
"Are my results significant? Is my analysis worthy?"
The directory tree of the repository tries to distinguish between these different approaches, which might aid sorting through the breath of available choices, and hopefully selecting the appropriate one(s).
Each concept is (ideally) demonstrated via a real-life example / short project.
-
Descriptive a.k.a. Exploratory Data Analysis
collect summary information : mean, st.dev., outliers, etc.
-
Inferential
the predictive aspect of statistics.
-
Classical
-
Frequentist
-
Bayesian
-
-
Machine_Learning
-
Supervised
Labeled data are provided
-
Regression
Aims to predict continuous valued output ~ trying to find answer within a range.
-
Classification
Aims to predict discrete valued output ~ answer will be True/False, Red/Blue/Green, etc.
-
-
Unsupervised
Our data are unlabebeled, the model needs to distinguish significant features within dataset.
-
-
- generated via the tree bash command.
.
├── Descriptive
├── Inferential
│ ├── Classical
│ │ ├── Bayesian
│ │ └── Frequentist
│ │ └── Common_Statistical_Tests.html
│ └── Machine_Learning
│ ├── Choosing_ML_Algorithm_RoadMap.png
│ ├── data
│ │ ├── baseball.csv
│ │ ├── boston.csv
│ │ ├── nba_test.csv
│ │ ├── nba_train.csv
│ │ ├── quality.csv
│ │ ├── stevens.csv
│ │ ├── wine.csv
│ │ └── wine_test.csv
│ ├── Supervised
│ │ ├── Classification
│ │ │ ├── Classification_via_ItSL.html
│ │ │ ├── Medical_Diagnosis_via_Logistic_Regression.ipynb
│ │ │ └── Supreme_Court_Opinions.ipynb
│ │ └── Regression
│ │ ├── Boston_House_Prices--Decision_Trees.ipynb
│ │ ├── Linear_Regression_via_ItSL.html
│ │ ├── NBA_Moneyball.ipynb
│ │ └── Predicting_Wine_prices.ipynb
│ └── Unsupervised
└── README.md
11 directories, 18 files
- Python 3.6 (code should also work for Python 2.7)
- Anaconda Python Data Science Distribution ; the easiest way to install & manage all scientific libraries used in this repo.
- Hastie Trevor, Efron Bradley, "Computer Age Statistical Inference - Algorithms, Evidence, and Data Science", 2017, Cambridge University Press