PA1_template.Rmd

---
output:
  html_document:
    keep_md: yes
    self_contained: no
---
# Reproducible Research: Peer Assessment 1


## Loading and preprocessing the data
```{r}
data <- read.csv(unz("activity.zip", "activity.csv"), na.strings = "NA", header = TRUE, colClasses = c("numeric", "Date", "numeric" ))
str(data)
```

## What is mean total number of steps taken per day?
Follows an histogram representing the total number of steps perfomed in one day:
```{r}
hist_data <- aggregate(steps ~ date , data, sum, na.rm = TRUE) 

library(ggplot2)
qplot(hist_data$steps, binwidth = 1000) +
      xlab("Number of steps") + ylab("Count")
```  

**The mean and meadian of steps taken per day is shown below:**
```{r}
mean(hist_data$steps)
median(hist_data$steps)
```

## What is the average daily activity pattern?
Here a time series plot of the 5-minute interval and the average number of steps
taken, averaged across all days:
```{r}
time_serie_mean <- aggregate(steps ~ interval , data, mean, na.rm = TRUE) 
ggplot(time_serie_mean, aes(interval, steps)) + geom_line() +
      xlab("5-minute interval") + ylab("Mean Steps in 5 min Interval")
```

**5-minute Interval with the most steps in average is: 
`r time_serie_mean[which.max(time_serie_mean$steps), c("interval")]`.**

## Imputing missing values
```{r}
uncomplete_rows <- sum(complete.cases(data) == FALSE)
```
The number of row with NA is: `r uncomplete_rows`

NAs will be set equal to the mean of that 5-minute interval across the whole 
data set and stored in data2:
```{r}
data2 <- data
data2[complete.cases(data2) == FALSE, c("steps")] <- time_serie_mean[sapply(data2[complete.cases(data2) == FALSE, c("interval")], function(x) which(time_serie_mean$interval==x)), c("steps")]
str(data2)
```
  
Follows an histogram representing the total number of steps perfomed in one day 
(after the NA correction):
```{r}
hist_data2 <- aggregate(steps ~ date , data2, sum, na.rm = TRUE) 

library(ggplot2)
qplot(hist_data2$steps, binwidth = 1000) +
      xlab("Number of steps") + ylab("Count")
```  

**The mean and meadian of steps taken per day is shown below (NA correction
applied:**
```{r}
mean(hist_data2$steps)
median(hist_data2$steps)
```


## Are there differences in activity patterns between weekdays and weekends?
Differences shown in the plot below:
```{r}
weekend <- factor(weekdays(data2$date) == "domenica" | 
                      weekdays(data2$date) == "sabato", 
                  levels = c("FALSE", "TRUE"), labels = c("weekday", "weekend"))
data2 <- cbind(data2, weekend)
time_serie_mean2 <- aggregate(steps ~ interval + weekend , data2, mean) 
ggplot(time_serie_mean2, aes(x=interval, y=steps)) + geom_line() + 
    facet_wrap( ~ weekend, ncol=1) +
      xlab("5-minute interval") + ylab("Mean Steps in 5 min Interval")
```