1. Data preprocessing and exploratory analysis

deal with missing values and categorical variables (use one-hot encoding)
check for correlated variables
analyse distributions of attributes
visualise the data using two or three principal components

2. Logistic Regression model

3. More advanced models

3.1 Random Forest

3.2. Support Vector Machine

3.3. Feed Forward Neural Network

For each model:

standarize the dataset
use the same cross-validation scheme
inspect feature importance
evaluate model with logloss, ROC-AUC and F1 classification metrics

Resources

dataset
paper
Python Machine Learning by Sebastian Raschka
interesting talks:
- Machine Learning 101 by Kyle Kastner (+ GitHub repo)
- Classification using Pandas and Scikit-Learn by Skipper Seabold (+ GitHub repo)
- Machine Learning with Scikit-Learn by Jake VanderPlas (+ GitHub repo)
- Neural Nets for Newbies by Melanie Warrick (+ GitHub repo)

Keywords:

dimensionality reduction
cross-validation
supervised vs unsupervised learning
regression vs classification
confidence intervals and p-values

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
Malignant mesothelioma analysis.ipynb		Malignant mesothelioma analysis.ipynb
README.md		README.md
SVM_classifier.ipynb		SVM_classifier.ipynb
data.csv		data.csv
data_encoded.csv		data_encoded.csv
getting_started.ipynb		getting_started.ipynb
random_forest_model.ipynb		random_forest_model.ipynb
test_data.csv		test_data.csv
train_data.csv		train_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1. Data preprocessing and exploratory analysis

2. Logistic Regression model

3. More advanced models

3.1 Random Forest

3.2. Support Vector Machine

3.3. Feed Forward Neural Network

Resources

Keywords:

Tools

About

Releases

Packages

Contributors 4

Languages

cheminfIBB/MM-diagnosis

Folders and files

Latest commit

History

Repository files navigation

1. Data preprocessing and exploratory analysis

2. Logistic Regression model

3. More advanced models

3.1 Random Forest

3.2. Support Vector Machine

3.3. Feed Forward Neural Network

Resources

Keywords:

Tools

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages