forked from itsleeds/highways-course
-
Notifications
You must be signed in to change notification settings - Fork 0
/
exercises.Rmd
201 lines (179 loc) · 7.58 KB
/
exercises.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
---
output: github_document
---
<!-- Note: edit the .Rmd file not the .md file -->
## 1.1 R and RStudio
### 1.1a
1. Get into groups of 2 and, in those groups:
1. Create a new RStudio project
1. Create a new blank R script
1. Identify and interact with each of the 4 panels in RStudio
1. Get help on the plot function with `?plot`
1. Create a plot using the `plot()` function
1. Find and install a new package on a topic of your choice with `Tools > Install Packages` (requires internet)
1. Attach the package using `library()`
1. Find and install a new package with `install.packages()`
1. In your source panel write code that creates vector objects `x` and `y` and plots them with `plot(x, y)` to create something that looks like this:
<!-- (is it reproducible?) -->
```{r}
# hint: create a vector object of the numbers 1, 2, 3 and 6 and call it x:
x = c(1, 2, 3, 6)
```
```{r, echo=FALSE}
y = x^2
plot(x, y)
```
- Bonus: find out exactly what R version you are using (tip: use a search engine!)
- Bonus: use R to find out how many minutes you've been alive for. Feel free to use an invented age. Tip: try using both 'base' `as.POSIXct()` and 'tidyverse' `ymd_hm()` functions - you may also need to search online for this.
```{r, eval=FALSE, echo=FALSE}
date_of_birth1 = as.POSIXct("1985-12-20 12:00")
date_of_birth2 = ISOdatetime(
year = 1985,
month = 12,
day = 20,
hour = 12,
min = 0,
sec = 0
)
date_of_birth3 = lubridate::ymd_hm("1985-12-20 12:00")
current_time = Sys.time()
diff_days = current_time - date_of_birth1
diff_days * 24 * 60
difftime(current_time, date_of_birth1, units = "m")
```
```{r, echo=FALSE, eval=FALSE}
devtools::install_github("ITSLeeds/stats19")
stats19::dl_stats19_2017_ac()
ac17 = stats19::read_stats19_2005_2014_ac(data_dir = "dftRoadSafetyData_Accidents_2017", filename = "Acc.csv")
ac17 = stats19::format_stats19_2016_ac(ac17)
ac17
View(ac17)
ac_wy = ac17[ac17$Police_Force == "West Yorkshire", ]
ac_wy = dplyr::filter(ac17, Police_)
library(tidyverse)
ac_wy = ac17 %>%
filter(Police_Force == "West Yorkshire")
plot(ac_wy$Longitude, ac_wy$Latitude)
```
### 1.1b R classes
1. What class is each of these objects:
```{r}
x = 1:6
y = sqrt(x)
z = y + 0.1
z[3] = "hello"
```
```{r, echo=FALSE, eval=FALSE}
class(x)
class(y)
class(z)
typeof(x)
is.vector(x)
data.frame(x, y, z)
m = cbind(x, y, z)
class(m)
typeof(m)
```
1. Create a data frame that contains variables `x`, `y` and `z` and write it out as a `.csv` file.
- Bonus create a matrix composed of `x`, `y` and `z` variables. What type does it have?
1. Download and read-in the `ac_wy.csv` dataset using `read_csv()`. Hint, the following command may help get it:
```{r, eval=FALSE}
f = "https://github.com/ITSLeeds/highways-course/releases/download/0.2/ac_wy.csv"
download.file(url = f, destfile = "ac_wy.csv")
```
1. How many rows are in the `ac_wy` data frame?
1. How many lists are in the `ac_wy` data frame?
```{r, echo=FALSE}
# to be continued...
```
## 2 Stats refresher and packages
1. Discuss in groups: what kind of statistical analysis do you you do, exploratory or hypothesis testing?
1. Use `sessionInfo()` to find out what which packages are currently attached in your R session.
- How many are there?
- Run the command `devtools::session_info()`. What's different about the result?
1. Attach the tidyverse package. What does each of the messages mean?:
```{r, echo=FALSE}
library(tidyverse)
```
1. How many packages are now attached?
1. Restart your R session and load some **tidyverse** packages individually. Start with **readr**, **dplyr** and **ggplot2**.
1. Run `ggplot(data = mpg)`. What do you see?
1. What does the `drv` variable describe? Read the help for `?mpg` to find
out.
1. Make a scatterplot of `hwy` vs `cyl`.
1. Create a barplot showing the number and proportion of crashes in the `ac_wy` dataset on different types of roads using **ggplot2**:
- Roads with different speed limits (absolute counts and proportions)
- Different road types (A roads, B roads etc)
```{r, echo=FALSE, message=FALSE}
ac_wy = readr::read_csv("ac_wy.csv")
ggplot(ac_wy, aes(Speed_limit)) + geom_bar()
# ggplot(ac_wy, aes(Speed_limit)) +
# geom_bar(mapping = aes(y = ..prop.. * 100)) +
# ylab("Percentage of roads")
# ggplot(ac_wy, aes(`1st_Road_Class`)) +
# geom_bar()
# speed_table = table(ac_wy$`1st_Road_Class`)
# speed_proportional = speed_table / nrow(ac_wy) * 100
# speed_table_df = as.data.frame(speed_proportional)
# ggplot(speed_table_df, aes(Var1, Freq)) +
# geom_bar(stat = "identity")
```
# 3 Spatial data
1. Practical: [Section 3.2 to 3.2.2](https://geocompr.robinlovelace.net/attr.html#vector-attribute-manipulation) of handouts
1. Work through exercises 1:3 in the hand-outs
1. How many states:
- Contain the letter E (hint: `?grepl`)?
- Start with A (hint: search for "regex starts with")?
1. Plot all the 'E' states in red and plot the 'A' states with a thick border.
```{r, message=FALSE, echo=FALSE}
library(sf)
library(spData)
library(dplyr)
us_e = us_states %>%
filter(grepl(pattern = "e", x = NAME, ignore.case = T))
us_a = us_states %>%
filter(grepl(pattern = "^A", x = NAME))
plot(us_states$geometry)
plot(us_e$geometry, col = "red", add = TRUE)
plot(us_a$geometry, lwd = 5, add = TRUE)
```
- Bonus: use the `world` dataset to find the top 3 smallest and largest countries in the world by population and area, and plot them on a single map.
## 4 Spatial data II and roadworks
1. Practical: working-through sections [3.2.3 to 3.2.4](https://geocompr.robinlovelace.net/attr.html#vector-attribute-joining) of hand-outs
1. [Exercises](https://geocompr.robinlovelace.net/attr.html#exercises-1): 4 to 6 onwards
1. Identify the states that grew most and least from 2010 to 2015 and plot them.
```{r, message=FALSE, echo=FALSE}
library(tmap)
us_states$change = us_states$total_pop_15 - us_states$total_pop_10
us_states$growth = NA
top3 = tail(order(us_states$change), 3)
bot3 = head(order(us_states$change), 3)
us_states$growth[top3] = "Most"
us_states$growth[bot3] = "Least"
tm_shape(us_states) +
tm_polygons("growth")
```
1. Create an obect that represents the boundary of the USA.
- Bonus: create a random distribution of 1000 points across the United States and subset those that are in Texas. Plot the results to check your code works, does it look something like this?
```{r, message=FALSE, echo=FALSE}
us_union = st_union(us_states)
xran = runif(n = 1000, min = st_bbox(us_states)[1], max = st_bbox(us_states)[3])
yran = runif(n = 1000, min = st_bbox(us_states)[2], max = st_bbox(us_states)[4])
mran_sample = st_sample(us_states, 1000)
mran = data.frame(xran, yran, n = 1:1000)
crs_us = st_crs(us_states)
mran_sf = st_as_sf(mran, coords = c("xran", "yran"), crs = crs_us)
# st_crs(mran_sample)
mran_sf = st_sf(geometry = mran_sample)
tex = us_states %>%
filter(NAME == "Texas")
sel_tex = st_within(mran_sf, tex)
sel_tex_binary = lengths(sel_tex) > 0
mran_sf$in_texas = sel_tex_binary
tm_shape(us_union) +
tm_polygons() +
tm_shape(mran_sf) +
tm_dots(col = "in_texas", size = 0.2)
```
- Advanced option 1: Section [4.2 - Spatial operations on vector data](https://geocompr.robinlovelace.net/spatial-operations.html#spatial-vec) of Geocomputation with R
- Advanced option 2: Install the **roadworksUK** package and identify which MSOA in Ashford had the highest number of gas-related roadworks in the `htdd_ashford` dataset.