Name	Name	Last commit message	Last commit date
Latest commit neelsoumya Add files via upload Mar 13, 2025 463a5ba · Mar 13, 2025 History 85 Commits
additional_code	additional_code	edits automated git	Feb 7, 2024
adult	adult	edits automated git	Feb 5, 2024
data	data	edits automated git	Feb 7, 2024
solutions	solutions	edits automated git	Mar 5, 2024
.gitignore	.gitignore	edits automated git	Dec 29, 2023
Ch8-baggboost-lab.Rmd	Ch8-baggboost-lab.Rmd	edits automated git	Mar 10, 2024
Ch8-baggboost-lab.pdf	Ch8-baggboost-lab.pdf	edits automated git	Jan 3, 2024
README.md	README.md	Update README.md	Mar 4, 2024
cross_validation_practical.Rmd	cross_validation_practical.Rmd	edits automated git	Mar 10, 2024
cross_validation_practical.pdf	cross_validation_practical.pdf	edits automated git	Jan 17, 2024
cross_validation_practical_backup.Rmd	cross_validation_practical_backup.Rmd	edits automated git	Mar 10, 2024
diabetes.csv	diabetes.csv	edits automated git	Feb 22, 2024
git_addcommitpush.sh	git_addcommitpush.sh	edits automated git	Dec 29, 2023
installer.R	installer.R	edits automated git	Mar 4, 2024
metagene_score.csv	metagene_score.csv	edits automated git	Feb 5, 2024
neuralnet.R	neuralnet.R	Add files via upload	Mar 13, 2025
perceptron.R	perceptron.R	edits for 2025	Mar 3, 2025

Repository files navigation

practical_supervised_machine_learning

Introduction

Resources for a practical on supervised machine learning

Code

The Rmarkdown scripts and R scripts constitute the practical.

cross_validation_practical.Rmd shows the basics of cross-validation.

Ch8-baggboost-lab.Rmd shows the basics of decision trees and random forests (bagging and boosting).

If you have time, you can try code in the additional_code folder:

caret_rf.Rmd shows the basics of using the caret package in R to build a machine learning pipeline.

Installation

Clone or download this repository.

Then install R and R Studio.

Install R

https://www.r-project.org/
and R Studio

https://www.rstudio.com/products/rstudio/download/preview/

OR

follow the instructions here:

https://cambiotraining.github.io/intro-r/#Setup_instructions

From the command line run the R script installer.R to install all packages

R --no-save < installer.R

OR

run the script installer.R in R Studio.

Exercise

For an exercise, do these problems.

Exercise 1: Download data from

https://github.com/neelsoumya/teaching_reproducible_science_R/blob/main/metagene_score.csv

and train a classifier to predict yes/no (flag_yes_no). This would be a binary classifier. Do this with cross-validation. Write your own R code to do this task. You can work in groups. Do this in class.

The data is also available here

https://github.com/neelsoumya/practical_supervised_machine_learning/blob/main/metagene_score.csv

Exercise 2: Download data from

https://archive.ics.uci.edu/dataset/2/adult

and train a classifier to predict if income > 50K or < 50K (binary classifier). Do this with cross-validation. Write your own R code to do this task. You can work in groups. Do this in class.

Challenge: Use cross-validation to select a few important features.

The data is also available in the adult folder here

https://github.com/neelsoumya/practical_supervised_machine_learning/tree/main/adult

Challenge exercise

How would you select the features that go in the logistic regression model?

Think of a brute force approach.

Can you think of a more sophisticated apporach?

For an advanced exercise use the glmnet package in R.

See

https://glmnet.stanford.edu/articles/glmnet.html#logistic-regression-family-binomial

Need more of a challenge? See the additional_code folder and see the caret programs.

Exercise 3: Download the data from

https://github.com/neelsoumya/practical_supervised_machine_learning/blob/main/diabetes.csv

* Remember to visualize the data and normalize features

* Build a random forest model to predict diabetes outcome (0/1)

* Plot the out of bag (OOB) error as a function of the number of trees

Resources

Free PDF of book and R code

Acknowledgements

I thank Dr. Bajuna Salehe for useful discussions and feedback.

All material is take from the following resources:

Contact

Soumya Banerjee

sb2333@cam.ac.uk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

practical_supervised_machine_learning

Introduction

Code

Installation

Exercise

Resources

Acknowledgements

Contact

About

Releases 1

Packages

Languages

neelsoumya/practical_supervised_machine_learning

Folders and files

Latest commit

History

Repository files navigation

practical_supervised_machine_learning

Introduction

Code

Installation

Exercise

Resources

Acknowledgements

Contact

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages