Skip to content

Latest commit

 

History

History
136 lines (100 loc) · 3.88 KB

PA1_template.md

File metadata and controls

136 lines (100 loc) · 3.88 KB
title output
Reproducible Research: Peer Assessment 1
html_document
keep_md
true

Loading and preprocessing the data

dat <- read.csv("activity.csv")
dat.complete <- dat[complete.cases(dat), ]

What is the mean total number of steps taken per day?

totals <- tapply(dat.complete$steps, dat.complete$date, sum)
totals <- totals[!is.na(totals)]
mn <- mean(totals)
md <- median(totals)
hist(totals, xlab = "Daily number of taken steps", ylab = "Number of days", 
    main = "Histogram of total daily number of taken steps")

plot of chunk unnamed-chunk-2

The mean total number of steps taken per day is 1.0766 × 104 and the median is 10765.

What is the average daily activity pattern?

meanInterval <- tapply(dat.complete$steps, dat.complete$interval, mean)
meanInterval <- meanInterval[!is.na(meanInterval)]
maxInterval <- names(which.max(meanInterval))
plot(names(meanInterval), meanInterval, type = "l", xlab = "Daily interval", 
    ylab = "Mean number of steps", main = "Mean across days of number of steps taken in every interval")

plot of chunk unnamed-chunk-3

The 5-minute interval, on average across all the days in the dataset, that contains the maximum number of steps is 835.

Imputing missing values

nRows <- nrow(dat)
nNA <- sum(!complete.cases(dat))

The total number of rows with NA s in the dataset is 2304 (out of a total of 17568).

The imputation strategy I chose is the mean for the 5 minute interval across all days.

dat.imputed <- dat
dat.imputed$steps <- apply(dat, 1, function(x) {
    tmp <- as.numeric(x["steps"])
    if (is.na(tmp)) 
        meanInterval[as.character(as.numeric(x["interval"]))] else tmp
})
totalsImputed <- tapply(dat.imputed$steps, dat.imputed$date, sum)
mn2 <- mean(totalsImputed)
md2 <- median(totalsImputed)
hist(totalsImputed, xlab = "Daily number of taken steps", ylab = "Number of days", 
    main = "Histogram of total daily number of taken steps with NAs imputed")

plot of chunk unnamed-chunk-6

if (mn2 != mn) {
    mnString <- "different from"
} else {
    mnString <- "the same as"
}
if (md2 != md) {
    mdString <- "different from"
} else {
    mdString <- "the same as"
}
delta <- totals[names(totals)] - totalsImputed[names(totals)]
delta <- sum(delta^2)/length(delta)

The mean total number of steps taken per day (after imputation) is 1.0766 × 104 and the median is 1.0766 × 104. The value of the mean is the same as before. The value of the median is different from before. The mean squared difference between the daily total numbers of steps (for the days that had such a value before imputation) is 0.

Are there differences in activity patterns between weekdays and weekends?

typeOfDay <- function(date) {
    result <- weekdays(date)
    result[!(result == "Sunday" | result == "Saturday")] = "weekday"
    result[result == "Sunday" | result == "Saturday"] = "weekend"
    result
}
dat.imputed$weekday = factor(typeOfDay(as.Date(dat.imputed$date)))

dat.weekday <- dat.imputed[dat.imputed$weekday == "weekday", ]
dat.weekend <- dat.imputed[dat.imputed$weekday == "weekend", ]
meanInterval.weekday <- tapply(dat.weekday$steps, dat.weekday$interval, mean)
meanInterval.weekend <- tapply(dat.weekend$steps, dat.weekend$interval, mean)

meanInterval2 <- rbind(data.frame(interval = as.numeric(names(meanInterval.weekend)), 
    steps = meanInterval.weekend, typeofday = "weekend"), data.frame(interval = as.numeric(names(meanInterval.weekday)), 
    steps = meanInterval.weekday, typeofday = "weekday"))

library(ggplot2)
ggplot(meanInterval2, aes(interval, steps)) + geom_line() + facet_wrap(~typeofday, 
    nrow = 2) + ylab("Number of steps\n") + xlab("\nInterval")

plot of chunk unnamed-chunk-7