GitHub - laurenfitch/tidydata: This repo is set up for the course project in the Coursera course Getting and Cleaning Data.

title	author	date	output
README.md	Lauren Fitch	Sunday, March 22, 2015	html_document

I started by getting the list of files in the test and train directories.

test_files <- list.files("tidy_data/test", 
                         pattern = "*.txt", full.names = TRUE)
training_files <- list.files("tidy_data/train",
                             pattern = "*.txt", full.names = TRUE)

Then I read those files into R, which required the stringr package.

install.packages("stringr")
library(stringr)

I made names for the data frames that match the original filenames.

test_file_names <- str_sub(test_files, 16, -5)
train_file_names <- str_sub(training_files, 17, -5)

ldf_test <- lapply(test_files, read.table)
names(ldf_test) <- test_file_names

ldf_train <- lapply(training_files, read.table)
names(ldf_train) <- train_file_names

Read in features

features <- read.table("tidy_data//features.txt")

Just take the second column

features <- features[ ,2]

Remove special characters

features <- str_replace_all(features, "[-(),]", "")

Concatenate together into one data frame

merge_data_test <- data.frame(ldf_test[1], ldf_test[2], ldf_test[3])
names(merge_data_test) <- c("subject", features, "y")

merge_data_train <- data.frame(ldf_train[1], ldf_train[2], ldf_train[3])
names(merge_data_train) <- c("subject", features, "y")

merge_data <- rbind(merge_data_test, merge_data_train)

Extracts only the measurements on the mean and standard deviation for each measurement. I looked for either "mean" or "std" in the variable name. I did not remove any variables from the resulting list.

col_names <- str_detect(names(merge_data), "mean|std")
col_names <- names(merge_data)[col_names]
col_names <- c("subject", col_names, "y")

extract_data <- merge_data[ , col_names]

I use descriptive activity names to name the activities in the data set, by reading in activity labels and matching them to the right row.

activity <- read.table("tidy_data/activity_labels.txt")
names(activity) <- c("id", "name")
extract_data$activity <- activity$name[match(extract_data$y, activity$id)]

I labeled the data set with descriptive variable names

names(extract_data) <- c(names(extract_data)[1:80], 
                         "activity_code", "activity")

From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.

Requires the dplyr package.

install.packages("dplyr")
library(dplyr)

Group the data by subject and activity

who <- group_by(extract_data, subject, activity)
tidy <- summarise_each(who, funs(mean))

Remove the activity code column

tidy <- tidy[ ,1:81]

Output the tidy data set

write.table(tidy, "tidy.txt", row.name = FALSE)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
codebook.md		codebook.md
run_analysis.R		run_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

laurenfitch/tidydata

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages