forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPA1_template.Rmd
87 lines (73 loc) · 2.72 KB
/
PA1_template.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
output:
html_document:
keep_md: yes
self_contained: no
---
# Reproducible Research: Peer Assessment 1
## Loading and preprocessing the data
```{r}
data <- read.csv(unz("activity.zip", "activity.csv"), na.strings = "NA", header = TRUE, colClasses = c("numeric", "Date", "numeric" ))
str(data)
```
## What is mean total number of steps taken per day?
Follows an histogram representing the total number of steps perfomed in one day:
```{r}
hist_data <- aggregate(steps ~ date , data, sum, na.rm = TRUE)
library(ggplot2)
qplot(hist_data$steps, binwidth = 1000) +
xlab("Number of steps") + ylab("Count")
```
**The mean and meadian of steps taken per day is shown below:**
```{r}
mean(hist_data$steps)
median(hist_data$steps)
```
## What is the average daily activity pattern?
Here a time series plot of the 5-minute interval and the average number of steps
taken, averaged across all days:
```{r}
time_serie_mean <- aggregate(steps ~ interval , data, mean, na.rm = TRUE)
ggplot(time_serie_mean, aes(interval, steps)) + geom_line() +
xlab("5-minute interval") + ylab("Mean Steps in 5 min Interval")
```
**5-minute Interval with the most steps in average is:
`r time_serie_mean[which.max(time_serie_mean$steps), c("interval")]`.**
## Imputing missing values
```{r}
uncomplete_rows <- sum(complete.cases(data) == FALSE)
```
The number of row with NA is: `r uncomplete_rows`
NAs will be set equal to the mean of that 5-minute interval across the whole
data set and stored in data2:
```{r}
data2 <- data
data2[complete.cases(data2) == FALSE, c("steps")] <- time_serie_mean[sapply(data2[complete.cases(data2) == FALSE, c("interval")], function(x) which(time_serie_mean$interval==x)), c("steps")]
str(data2)
```
Follows an histogram representing the total number of steps perfomed in one day
(after the NA correction):
```{r}
hist_data2 <- aggregate(steps ~ date , data2, sum, na.rm = TRUE)
library(ggplot2)
qplot(hist_data2$steps, binwidth = 1000) +
xlab("Number of steps") + ylab("Count")
```
**The mean and meadian of steps taken per day is shown below (NA correction
applied:**
```{r}
mean(hist_data2$steps)
median(hist_data2$steps)
```
## Are there differences in activity patterns between weekdays and weekends?
Differences shown in the plot below:
```{r}
weekend <- factor(weekdays(data2$date) == "domenica" |
weekdays(data2$date) == "sabato",
levels = c("FALSE", "TRUE"), labels = c("weekday", "weekend"))
data2 <- cbind(data2, weekend)
time_serie_mean2 <- aggregate(steps ~ interval + weekend , data2, mean)
ggplot(time_serie_mean2, aes(x=interval, y=steps)) + geom_line() +
facet_wrap( ~ weekend, ncol=1) +
xlab("5-minute interval") + ylab("Mean Steps in 5 min Interval")
```