-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPA1_template.Rmd
83 lines (67 loc) · 4.88 KB
/
PA1_template.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
# Reproducible Research: Peer Assessment 1
## Loading and preprocessing of the data
The data file is provided as a comma separated file. Thus the R function: `read.csv` was used to read in the data into a variable called *activity*. There was no need for any preprocessing of the data. The code that was used for loading is shown below:
```{r loadData, echo = TRUE}
activity <- read.csv("activity.csv")
```
## Mean and median of total number of steps taken per day
The mean number of steps taken per day was `r round(mean(activity$steps, na.rm = TRUE), 2)` and the median was `r median(activity$steps, na.rm = TRUE)`. A histogram was also drawn. The histogram showed that the number number of steps taken were not normally distributed. See R code below.
```{r histogram1, echo = TRUE, fig.cap = "Activity patterns"}
# load the lattice library if not yet loaded
library(lattice)
# Draw a histogram of the total number of steps taken each day
histogram(activity$steps, xlab = "Number of steps")
# mean and median of total number of steps taken per day
mean(activity$steps, na.rm = TRUE)
median(activity$steps, na.rm = TRUE)
```
## Average daily activity pattern
To show the average daily activity pattern, a time series plot of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis) was was generated.
```{r timeseries, echo = TRUE, fig.cap = "Variations of steps taken with time"}
#load the lattice library if not yet loaded
library(lattice)
xyplot(steps ~ interval,type = "l", xlab = "Interval", ylab = "Number of steps", data= activity)
```
## Five minutes interval with maximum number of steps
The five minutes interval with maximimum number of steps was: interval `r activity$interval[which(activity$steps == max(activity$steps, na.rm = TRUE))]`. The R code below shows how this was identified.
```{r masStep, echo=TRUE}
activity$interval[which(activity$steps == max(activity$steps, na.rm = TRUE))]
```
## Imputing missing values
It was thought that the presence of missing values in the dataset could introduce bias into some calculations or summaries of the data. This was taken care of by imputing the missing values. The total number of missing values were `r num.missing <- activity[!complete.cases(activity),]`. After creating a new dataset from the original set, inspection of the new data set showed that the missing values were only present in the *steps* variable.
```{r imputing, echo=TRUE}
# store number of rows of data that have missing values
num.missing <- activity[!complete.cases(activity),]
# Create a new dataset
new.activity <- activity
# Imput the missing values with mean
new.activity$steps[is.na(new.activity$steps)] <- mean(new.activity$steps, na.rm = TRUE)
```
These were imputed with the mean of total number of steps (see R code below). The mean of the new dataset was `r round(mean(new.activity$steps, na.rm = TRUE),2)` and the median was `r median(new.activity$steps, na.rm = TRUE)`. Imputing the data with the mean resulted into the same mean number of steps per day, which makes sense logically. The median remains the same at `r median(new.activity$steps, na.rm = TRUE)`. A histogram of the total number steps was drawn and showed that the total number of steps was not normally distributed even after the imputing (see R code below).
```{r histogram2, echo = TRUE, fig.cap = "Activity patterns of imputed data"}
#load the lattice library if not yet loaded
library(lattice)
#Draw histogram of the total number of steps taken each day for the new data
histogram(new.activity$steps, xlab = "Number of steps")
#mean and median total number of steps taken per day for the new data
mean(new.activity$steps, na.rm = TRUE)
median(new.activity$steps, na.rm = TRUE)
```
## Differences in activity patterns between weekdays and weekends
A panel plot comparing the average number of steps taken per 5-minute interval across weekdays and weekends was produced from the imputed data (see R code below for details of how the plot was created). There was no noticable difference in activity patterns during week days and weekends.
```{r panel, echo = TRUE, fig.cap = "Activity Patterns: weekdays and weekends"}
#load the lattice library if not yet loaded
library(lattice)
# convert the "date" variable in new.activity from type character to type date
date<-as.Date(new.activity$date)
# create a loogical variable "weekday.class"
weekday.class <- weekdays(date)[] == "Saturday" | weekdays(date)[]=="Sunday"
# set values of "weekday.class" variable to "weekend" for TRUE values and "weekday"
# for FALSE values.
weekday.class[weekday.class[] == TRUE] <- "weekend"
weekday.class[weekday.class[] == FALSE] <- "weekday"
# add a new variable, "weekday.class", as a foctor, to the new.activity dataset
new.activity$weekday.class <- factor(weekday.class)
# generate a panel plot
xyplot(steps~interval|weekday.class,type = "l", xlab = "Interval", ylab = "Number of steps",layout= c(1,2), data = new.activity)
```