-
Notifications
You must be signed in to change notification settings - Fork 65
/
Copy pathclass7.Rmd
149 lines (114 loc) · 2.92 KB
/
class7.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
title: 'Data Analysis 3: Week 7'
author: "Alexey Bessudnov"
date: "1 March 2019"
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(message = FALSE)
knitr::opts_chunk$set(warning = FALSE)
knitr::opts_chunk$set(cache = TRUE)
```
Plan for today:
1. Machine learning workshop at the RSS.
2. Exercises on data visualisation.
3. Homework for next week: data structures.
Exercises:
1. Open the data from the youth questionnaire from wave 8.
2. Today we will be working with the variable on BMI (h_ypbmi_dv) and visualise the distribution of BMI by sex, age, and ethnic group.
```{r}
library(tidyverse)
youth8 <- read_tsv("data/UKDA-6614-tab/tab/ukhls_w8/h_youth.tab")
# summary(youth8$h_ypbmi_dv)
youth8 %>% pull(h_ypbmi_dv) %>% summary()
youth8 <- youth8 %>%
mutate(bmi = recode(h_ypbmi_dv, `-9` = NA_real_))
youth8 %>% pull(bmi) %>% summary()
# simple histogram
youth8 %>%
ggplot(aes(x = bmi)) +
geom_histogram(bins = 50) +
geom_vline(xintercept = 30, colour = "red") +
xlab("Body mass index") +
ylab("Number of observations")
# BBC style
library(bbplot)
youth8 %>%
ggplot(aes(x = bmi)) +
geom_histogram(bins = 50) +
geom_vline(xintercept = 30, colour = "red") +
ylab("Number of observations") +
bbc_style() +
xlab("Body mass index")
```
BMI by sex.
Boxplot:
```{r}
youth8 <- youth8 %>%
mutate(sex = ifelse(h_sex_dv == 2, "female",
ifelse(h_sex_dv == 1, "male", NA)))
youth8 %>% count(h_sex_dv, sex)
youth8 %>%
ggplot(aes(x = sex, y = bmi)) +
geom_boxplot() +
# this changes the boxplots from vertical to horizontal
coord_flip()
```
density by group
```{r}
youth8 %>%
ggplot(aes(x = bmi, fill = sex)) +
geom_histogram(position = "dodge")
youth8 %>%
ggplot(aes(x = bmi, fill = sex)) +
geom_histogram(bins = 50, position = "identity", alpha = 0.5)
youth8 %>%
ggplot(aes(x = bmi, colour = sex)) +
geom_density()
youth8 %>%
ggplot(aes(x = bmi, fill = sex)) +
geom_density()
youth8 %>%
ggplot(aes(x = bmi, fill = sex)) +
geom_density() +
# manually setting the colours
scale_fill_manual(values = c("purple", "yellow"))
```
Barplot with means.
```{r}
youth8 %>%
group_by(sex) %>%
summarise(
meanBMI = mean(bmi, na.rm = TRUE)
) %>%
ggplot(aes(x = sex, y = meanBMI, fill = sex)) +
geom_bar(stat = "identity")
youth8 %>%
group_by(sex) %>%
summarise(
meanBMI = mean(bmi, na.rm = TRUE)
) %>%
ggplot(aes(x = sex, y = meanBMI)) +
geom_point() +
ylim(0, 25) +
coord_flip()
youth8 %>%
group_by(h_gor_dv) %>%
summarise(
meanBMI = mean(bmi, na.rm = TRUE)
) %>%
ggplot(aes(x = reorder(as.factor(h_gor_dv), meanBMI), y = meanBMI)) +
geom_point() +
coord_flip()
```
Faceted chart.
```{r}
youth8 %>%
ggplot(aes(x = bmi)) +
geom_histogram(bins = 50) +
geom_vline(xintercept = 30, colour = "red") +
xlab("Body mass index") +
ylab("Number of observations") +
facet_wrap(~ sex)
```