README file for Getting and Cleaning Data course project

Overview

This repository contains code for cleaning a dataset as required for the course project for "Getting and Cleaning Data".

The user should download the raw data and use the run_analysis.R script in this repository to clean the data and produce the tidy dataset. The script can be run as

run_analysis()

This code will clean the raw dataset and write a single text file containing the tidy data called "tidy_data.txt", which is also included in this repository. This file contains 66 different variables from the raw dataset which have been averaged over subject and activity.

There is a code book in this repository called "CodeBook.md" which contains a description of the raw and tidy data, the cleaning procedure, and descriptions of the variables.

Description of data cleaning code (run_analysis.R)

Load activity labels and description from "activity_labels.txt" in the raw dataset. This matches an ID number to an activity description.
Load variable names from "features.txt" in the raw dataset.
Load data files for the training and test datasets ("train/X_train.txt" and "test/X_test.txt") and assign to data tables.
Load the activity IDs for each observation in the training and test datasets ("train/y_train.txt" and "test/y_test.txt").
Load the subject IDs for each observation in the training and test datasets ("train/subject_train.txt" and "test/subject_test.txt").
Assign column names to each data table using the variable names from step 2.
Pick out only variables that are mean or standard deviation measurements.
Add the activity IDs from step 4 to a new variable in each data table.
Add the subject IDs from step 5 to a new variable in each data table.
Bind the training and test data tables into one data table.
Use a left_join to add activity descriptions to the data table based on the activity IDs.
Remove activity IDs since they are not useful.
Group observations by subject ID and activity.
Calculate the mean of each measurement in each group.
Clean up variable names using the following format:
- Replace "t" and "f" prefixes with "time" and "freq", respectively.
- Replace erroneous occurrences of "BodyBody" with "Body".
- Replace "-" by "_".
- Replace "Acc" by "Accel".
- Remove occurrences of "()".
These steps were taken to improve variable readability and remove problematic variable name formatting.
Save the new tidy data in a space-separated text file.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
CodeBook.md		CodeBook.md
README.md		README.md
run_analysis.R		run_analysis.R
tidy_data.txt		tidy_data.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README file for Getting and Cleaning Data course project

Overview

Description of data cleaning code (run_analysis.R)

About

Releases

Packages

Languages

tprestegard/getting-and-cleaning-data-course-project

Folders and files

Latest commit

History

Repository files navigation

README file for Getting and Cleaning Data course project

Overview

Description of data cleaning code (run_analysis.R)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages