-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathlab_1.Rmd
199 lines (130 loc) · 5.61 KB
/
lab_1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
---
title: "Lab 1"
author: "YOUR NAME"
date: "30 September 2022"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(scales)
```
## Goals for today
For the first lab, we will reverse engineer some plots using a relatively small, historical dataset. This should help you practice the ``grammar of graphics'' as well as the process of downloading, completing, and knitting your assignments. This lab is based on the structure of the first assignment.
## Load and look at the data
METHOD 1
+ STEP 0: Installing packages in R (tidyverse, haven, readxl, rmarkdown)
+ STEP 1: Call the libraries you need
```{r, message=FALSE, warning=FALSE}
library(tidyverse) # csv, tsv, fwf
library(haven) # sas(SAS), sav(SPSS), dta(Stata)
library(readxl) # xlsx, xls(Excel)
```
+ STEP 2: Set your file path
```{r, message = FALSE, echo=TRUE}
setwd("C:/Users/cuadr/OneDrive/Escritorio/UCHICAGO/7. Intro QMSC 2022/Lab/2022/1")
# To reset your path:
# require("knitr")
# opts_knit$set(root.dir = "C:/Users/cuadr/OneDrive/Escritorio/UCHICAGO/7. Intro QMSC 2022/Lab/2022/1")
```
+ STEP 3: Upload your data set in R
```{r, message = FALSE}
midwest <- read_dta("midwest.dta")
male_female <- read.delim("male_female_counts.txt")
college <- read.csv("College.csv")
world_wealth_inequality <- read_excel("world_wealth_inequality.xlsx")
````
+ Now check your data!
```{r}
View(midwest)
View(male_female)
View(college)
View(world_wealth_inequality)
```
METHOD 2
```{r, message = FALSE, echo = FALSE}
#set path to retrieve data from Git repository
data_path <- "https://raw.github.com/UChicago-pol-methods/IntroQSS-F22/main/data/"
```
```{r, message = FALSE, echo = FALSE}
#retrieve data and save as dataframe, specify col types
df <- read_csv(str_c(data_path, "uk_coal_tables.csv"),
col_types = "cddddddd",
na = "N/A")
```
Don't worry if you cannot interpret the code above. If you can, excellent! All you need to know is that our data is now in the environment and saved as an object called ``df`` (for ``dataframe'').
Before we worry about visualizing the data, let's look at what it contains. Run the chunk below to view the column headings and first several rows of data.
```{r, message = FALSE}
head(df) #show col names and several rows of data
```
**What are the variables in the data set?**
YOUR ANSWER HERE
**What does each row represent? (i.e. what is the \textit{unit of observation}?)**
YOUR ANSWER HERE
**Notice that the first several rows of ``Proportion_Imported_From_UK'' are empty. Why might this data be missing? (hint: look at column 1) **
YOUR ANSWER HERE
## First plot
For the first plot, most of the code has been filled in for you. Finish the code to recreate this image.
Incomplete code:
plot_1 <- ggplot(data = df,
mapping = aes(x = Year,
y = ______,
linetype = ____)) +
geom_line() +
labs(x = "Year",
y = "Proportion of Consumed Coal Imported from UK",
____ = "Dependence on UK Coal Over Time")
plot_1
```{r, message = FALSE}
#YOUR CODE HERE
```
## Second plot
For the second plot, more of the code is missing. See if you can recreate the image. (hint, the line being fitted is a linear model)
Incomplete code:
plot_2 <- ggplot(data = __,
mapping = aes(x = _____,
y = _____)) +
geom_point() +
geom_smooth(method = "__",
____ = "red") +
labs(__ = "Tons Produced",
__ = "Tons Exported",
title = "Coal Production and Exports")
plot_2
```{r, message = FALSE}
#YOUR CODE HERE
```
**What relationship does this plot represent?**
YOUR ANSWER HERE
**Does the overall trend (as represented by the line) fit your expectations?**
YOUR ANSWER HERE
**Why do you think there is a cluster of points in the bottom right corner?**
YOUR ANSWER HERE
**What about the points in the lower left?**
YOUR ANSWER HERE
## Third Plot
Since we suspect that the clustering is the result of different countries exporting different proportions of the coal they produce, we can alter the code above slightly to color the points by country and fit a separate line for each country. Complete the code to produce this plot.
Incomplete code:
plot_3 <- ggplot(_____ = ____,
_____ = ______(x = ______,
y = _______,
_______ = Country)) +
______() +
geom_smooth(method = "____") +
labs(x = "_____",
y = "______",
title = "_______")
plot_3
```{r, message = FALSE}
#YOUR CODE HERE
```
**What issues do you see with this plot?**
YOUR ANSWER HERE
**How might we address this concern with a different visualization?**
YOUR ANSWER HERE
## Fourth Plot
None of the code is provided for you. Think about what you can use from the previous plot and what has changed.
```{r, message = FALSE}
#YOUR CODE HERE
```
*While colors can be a powerful visualization tool, line type, point shape, and other visual elements can make your work more accessible to people who are colorblind. Alternatively, there are packages outside the tidyverse (e.g. ``ggthemes``) which contain colorblind friendly palettes or allow you to manually define a palette using several different color reference systems.*