GitHub - rdm0991/Getting-And-Cleaning-Data_Assignment: Repo for 'Getting And Cleaning Data' Course Assignment

Getting-And-Cleaning-Data_Assignment

Repo for 'Getting And Cleaning Data' Course Assignment

run_analysis.R

The script does the following cleanup:

The training and the test sets are merged to create one data set.
Only the measurements on the mean and standard deviation for each measurement are extracted.
The activities in the data set are named using descriptive activity names.
Appropriately labels the data set with descriptive activity names.
An independent tidy data set is created with the average of each variable for each activity and each subject.

Analysis

In R, source("run_analysis.R") initiates the analysis process and the output is as shown below:

[run_analysis.R] Getting and Cleaning Data Project
[run_analysis.R] Analysis started -- Sun May 15 12:36:50 PM 2016
[run_analysis.R] Reading test datasets. Sun May 15 12:36:50 PM 2016
[run_analysis.R] Getting dataset: Data/test
[run_analysis.R] reading features...
[run_analysis.R] activities...
[run_analysis.R] reading subjects...
[run_analysis.R] Finished Reading test datasets. Sun May 15 12:36:54 PM 2016
[run_analysis.R] Reading train datasets. Sun May 15 12:36:54 PM 2016
[run_analysis.R] Getting dataset: Data/train
[run_analysis.R] reading features...
[run_analysis.R] reading activities...
[run_analysis.R] reading subjects...
[run_analysis.R] Finished Reading train datasets. Sun May 15 12:37:05 PM 2016
[run_analysis.R] Joining datasets. Sun May 15 12:37:05 PM 2016
[run_analysis.R] Melting to include IDs. Sun May 15 12:37:05 PM 2016
[run_analysis.R] Dcasting to get summarized data for subject-Activity. Sun May 15 12:37:05 PM 2016
[run_analysis.R] Cleaned and summarized data file created - Data/cleanedOutput.txt Sun May 15 12:37:06 PM 2016

Process

For both the test and train datasets, produce an interim dataset:

 i.Extract the mean and standard deviation features
 (listed in CodeBook.md, section 'Extracted Features'). 
 This is the  values  table.    
 ii.Get the list of activities.    
 iii.Put the activity labels (not numbers) into the  values table.    
 iv.Get the list of subjects.    
 v.Put the subject IDs into the  values  table.

Join the test and train intermediate datasets.
Put each variable on its own row.
Rejoin the entire table, keying on subject/acitivity pairs, applying the mean function to each vector of values in each subject/activity pair. This is the clean dataset.
Write the clean dataset to disk.

Cleaned Data

The resulting clean dataset is in cleanOutput.txt. It contains one row for each subject/activity pair and columns for subject, activity, and each feature that was a mean or standard deviation from the original dataset.

Note: three files are read from each folder.
X_* is the feature value file, Y_* is the activity identifier for the feature,
Subject_* is the identifier for the rows in X

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Data		Data
CodeBook.md		CodeBook.md
README.md		README.md
run_analysis.R		run_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Getting-And-Cleaning-Data_Assignment

Repo for 'Getting And Cleaning Data' Course Assignment

run_analysis.R

Analysis

Process

Cleaned Data

About

Uh oh!

Releases

Packages

Languages

rdm0991/Getting-And-Cleaning-Data_Assignment

Folders and files

Latest commit

History

Repository files navigation

Getting-And-Cleaning-Data_Assignment

Repo for 'Getting And Cleaning Data' Course Assignment

run_analysis.R

Analysis

Process

Cleaned Data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages