GitHub - LuisFerTR/Weight-Lifting-Exercise-Quality: Johns Hopkins Coursera Practical Machine Learning Course Prediction Assignment Writeup

title

author

date

output

Weight Lifting Exercise Quality

Luis Talavera

August 9th 2022

html_document

keep_md
true

Summary

The goal of this project is create a model to predict the quality of how an exercise is performed using data from accelerometers on the belt, forearm, arm, and dumbbell of 6 participants. They were asked to perform barbell lifts correctly and incorrectly in 5 different ways. More information is available from the website here: http://web.archive.org/web/20161224072740/http:/groupware.les.inf.puc-rio.br/har (see the section on the Weight Lifting Exercise Dataset).

The data

The data used in this report was recorded by accelerometers on the belt, forearm, arm, and dumbbell of 6 participants. Participants were asked to perform one set of 10 repetitions of the Unilateral Dumbbell Biceps Curl in five different fashions: exactly according to the specification (Class A), throwing the elbows to the front (Class B), lifting the dumbbell only halfway (Class C), lowering the dumbbell only halfway (Class D) and throwing the hips to the front (Class E). Class A corresponds to the specified execution of the exercise, while the other 4 classes correspond to common mistakes.

Load and clean data

library(readr)

url_train <- "https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv"
url_test <- "https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv"

pml_training <- read_csv(url_train, na=c("NA","#DIV/0!",""))
pml_testing <- read_csv(url_test, na=c("NA","#DIV/0!",""))

dim(pml_training)

[1] 19622   160

For the feature selection we will first remove the near zero variance variables since they are considered to have less predictive power, then remove the variables that are mostly NA and finally the identification variables.

library(caret)

# Remove variables with Nearly Zero Variance
nzv <- nearZeroVar(pml_training, saveMetrics = TRUE)
pml_training <- pml_training[, nzv$nzv==FALSE]
pml_testing <- pml_testing[, nzv$nzv==FALSE]

# Remove variables that are mostly NA
pml_training <- Filter(function(x) mean(is.na(x)) < 0.6, pml_training)
pml_testing <- Filter(function(x) mean(is.na(x)) < 0.6, pml_testing)

# Remove identification variables
pml_training <- pml_training[-(1:5)]
pml_testing <- pml_testing[-(1:5)]

# Categorical variable
pml_training$classe <- factor(pml_training$classe)

The model

The model will be built using the random forest method and the data will be divided in a ratio of 70-30 to perform training and testing.

Cross validation is done with K = 5.

set.seed(1234)
trainIndex <- createDataPartition(y=pml_training$classe, times=1, 
                                  p=0.7, list=FALSE)

pmltrain <- pml_training[trainIndex, ]
pmltest <- pml_training[-trainIndex, ]

train.names <- colnames(pml_training[, -ncol(pml_training)])
pml_testing <- pml_testing[train.names]

set.seed(1234)
fitControl <- trainControl(method = "cv",
                           number = 5,
                           allowParallel = TRUE)

fit <- train(classe ~ ., method="rf", data=pmltrain, trControl = fitControl)
fit$finalModel


Call:
 randomForest(x = x, y = y, mtry = min(param$mtry, ncol(x))) 
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 28

        OOB estimate of  error rate: 0.25%
Confusion matrix:
     A    B    C    D    E  class.error
A 3738    0    0    0    1 0.0002674512
B    7 2533    3    1    0 0.0043238994
C    0    5 2286    2    0 0.0030527693
D    0    0    7 2148    0 0.0032482599
E    0    1    0    6 2410 0.0028961523

Prediction with Random Forest

Now, we will use our model to predict test data and plot a confusion matrix

predictFit <- predict(fit, newdata = pmltest)
cm <- confusionMatrix(predictFit, pmltest$classe)
cm

Confusion Matrix and Statistics

          Reference
Prediction    A    B    C    D    E
         A 1841    2    0    0    0
         B    0 1250    5    0    0
         C    0    1 1124    6    0
         D    0    0    0 1054    1
         E    0    0    0    1 1189

Overall Statistics
                                         
               Accuracy : 0.9975         
                 95% CI : (0.996, 0.9986)
    No Information Rate : 0.2844         
    P-Value [Acc > NIR] : < 2.2e-16      
                                         
                  Kappa : 0.9969         
                                         
 Mcnemar's Test P-Value : NA             

Statistics by Class:

                     Class: A Class: B Class: C Class: D Class: E
Sensitivity            1.0000   0.9976   0.9956   0.9934   0.9992
Specificity            0.9996   0.9990   0.9987   0.9998   0.9998
Pos Pred Value         0.9989   0.9960   0.9938   0.9991   0.9992
Neg Pred Value         1.0000   0.9994   0.9991   0.9987   0.9998
Prevalence             0.2844   0.1935   0.1744   0.1639   0.1838
Detection Rate         0.2844   0.1931   0.1736   0.1628   0.1837
Detection Prevalence   0.2847   0.1939   0.1747   0.1630   0.1838
Balanced Accuracy      0.9998   0.9983   0.9971   0.9966   0.9995

Bibliography

Velloso, E.; Bulling, A.; Gellersen, H.; Ugulino, W.; Fuks, H. Qualitative Activity Recognition of Weight Lifting Exercises. Proceedings of 4th International Conference in Cooperation with SIGCHI (Augmented Human '13) . Stuttgart, Germany: ACM SIGCHI, 2013.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
index_cache/html		index_cache/html
index_files/figure-html		index_files/figure-html
rsconnect/documents/index.Rmd/rpubs.com/rpubs		rsconnect/documents/index.Rmd/rpubs.com/rpubs
.gitignore		.gitignore
README.md		README.md
_config.yml		_config.yml
index.Rmd		index.Rmd
index.html		index.html
index.md		index.md
weight-lifting-quality.Rproj		weight-lifting-quality.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summary

The data

Load and clean data

The model

Prediction with Random Forest

Bibliography

About

Releases

Packages

Languages

LuisFerTR/Weight-Lifting-Exercise-Quality

Folders and files

Latest commit

History

Repository files navigation

Summary

The data

Load and clean data

The model

Prediction with Random Forest

Bibliography

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages