Getting and Cleaning Data Course Project

Overview

This repo contains code to complete the course project for class 3 in Coursera's Data Science specialization: Getting and Cleaning Data.

Code

There is one relevant file, run_analysis.R that runs several analyses (detailed in CodeBook.md) and outputs tidy data file tidy.txt

Assumptions

This code makes no assumptions! (beyond that a somewhat recent version of dplyr is installed)

The dataset is downloaded from https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip, unzipped to UCI HAR Dataset and processed as described in the code section above.

Steps

############################################################################################

1. Merges the training and the test sets to create one data set.

############################################################################################

Merges train and test data, including: * Variabes * Subject number * Activity index

Adds variable names to columns using map function.

############################################################################################

2. Extracts only the measurements on the mean and standard deviation for each measurement.

############################################################################################

Renames the variables of interest mean() and std() to recognizale values using gsub.

Uses make.names to rename column names so they're syntactically valid.

Filters out all but subject, activity, mean and standard deviation columns using dplyr.

############################################################################################

3. Uses descriptive activity names to name the activities in the data set.

############################################################################################

Reads in activty mapping information.

Uses sapply and map function defined above to rename activities.

############################################################################################

4. Appropriately labels the data set with descriptive variable names.

############################################################################################

Uses a series of gsub operations to expand abbreviated variable names.

############################################################################################

5. From the data set in step 4, creates a second, independent tidy data set with the

average of each variable for each activity and each subject.

############################################################################################

Users dplyr's pipe, group_by and summarize_each functions to provide mean of each variable.

Writes frame to tidy.txt

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
CodeBook.md		CodeBook.md
README.md		README.md
run_analysis.R		run_analysis.R
tidy.txt		tidy.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting and Cleaning Data Course Project

Overview

Code

Assumptions

Steps

1. Merges the training and the test sets to create one data set.

2. Extracts only the measurements on the mean and standard deviation for each measurement.

3. Uses descriptive activity names to name the activities in the data set.

4. Appropriately labels the data set with descriptive variable names.

5. From the data set in step 4, creates a second, independent tidy data set with the

average of each variable for each activity and each subject.

About

Releases

Packages

Languages

adetch/coursera-getting-and-cleaning-data

Folders and files

Latest commit

History

Repository files navigation

Getting and Cleaning Data Course Project

Overview

Code

Assumptions

Steps

1. Merges the training and the test sets to create one data set.

2. Extracts only the measurements on the mean and standard deviation for each measurement.

3. Uses descriptive activity names to name the activities in the data set.

4. Appropriately labels the data set with descriptive variable names.

5. From the data set in step 4, creates a second, independent tidy data set with the

average of each variable for each activity and each subject.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages