In biomedical research we often wish to classify data into two or more groups (eg. healthy and diseased) based on a variety of measurement variables, but how do you determine if the model you’ve selected is good?
In this BioData Club workshop instructor Crista Moreno will discuss the mathematics of logistic regression for binary classification modeling, and how to prevent the harms of overfitting with cross validation in R. Attendees will gain knowledge through hands-on exercises about the following concepts and data science skills.
- Logistic regression (logit function, probability, binary classification)
- Overfitting (adding parameters and high dimensional spaces)
- Cross validation Cross validation Error R (R markdown, R packages magrittr, dplyr, ggplot2, tidyr, corrplot, caret, rgl)
Anyone with interest in building a classification model for biomedical data is encouraged to utilize these materials! Prior experience with R, Rstudio, and a basic knowledge of classification modeling (also mathematical functions) will be helpful, but is not a requirement.