-
Notifications
You must be signed in to change notification settings - Fork 1
/
Lab1.Rmd
375 lines (256 loc) · 9.65 KB
/
Lab1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
---
title: 'CSSS 512: Lab 1'
output:
beamer_presentation
date: "2018-3-30"
subtitle: Logistics & R Refresher
---
# Agenda
1. **Logistics**
+ Labs, Office Hours, Homeworks
+ Goals and Expectations
+ $\textsf{R}$, $\textsf{R}$ Studio, $\textsf{R}$ Markdown, \LaTeX
\newline
2. **Time Series Data in R**
+ [Unemployment in Maine](http://www.maine.gov/labor/cwri/cps.html)
+ [Global Temperature](https://crudata.uea.ac.uk/cru/data/temperature/)
+ [Electricity, Beer, and Chocolate Production](http://www.abs.gov.au/)
\newline
3. **Panel Data in R**
+ Democracy and Income
+ Data wrangling
# Logistics
1. **Lab Sessions**: Fri, 1:00-2:20pm in Savery 117
+ Covers application of material from lecture using examples; clarification and extention of lecture material; Q & A for homeworks and lectures
+ Materials will be available on the [**course website**](http://faculty.washington.edu/cadolph/?page=24)
2. **Office Hours**: Tues, 3:00-4:20pm in Smith 220
+ Available for trouble shooting and specific questions about homework and lecture materials
3. **Homeworks**: 3-4 due every 2 weeks or so
+ Ideally, done using $\textsf{R}$ or $\textsf{R}$ Studio with write up in \LaTeX
+ Using $\textsf{R}$ Studio with $\textsf{R}$ Markdown is an easy way to do this
+ Many packages: `tseries`, `forecast`, `lmtest`, `urca`, `quantmod`, etc.
# Logistics
4. When this course is over, you should be able to do the following (and more):
+ Identify and understand time series dynamics: seasonality, deterministic trends, moving average processes, autoregressive processes
+ Distinguish between stationary and nonstationary time series, perform unit root tests, fit ARMA and ARIMA models, use cross validation for model assessment
+ Analyze multiple continuous time series using vector autoregression, perform cointegration tests, and estimate error correction models for cointegrated time series
+ Distinguish between random effects, fixed effects, and mixed effects and decide when each of these are appropriate
+ Understand Nickell bias and use an instrumental variable approach with GMM to address the issue
+ Perform multiple imputation and in-sample simulations for panel data
$\newline$
# Logistics
5. The course moves fast: you should be comfortable doing the following for the homework assignments and project
+ tidying and transforming data, especially time series and panel data
+ importing and exporting data sets
+ generating plots of your data and results
+ writing basic functions and loops for repeated procedures
* Fortunately, for those of you new to $\textsf{R}$, there are many resources to get you up to speed
+ Cowpertwait and Metcalfe (2009) - download via UW library
+ [Zuur et al. (2009)](https://canvas.uw.edu/courses/1064065/files)
+ [Wickham and Groleman (2017)](http://r4ds.had.co.nz/)
# Logistics
6. Please make sure that you have R or R Studio installed on your computer
7. If you would like to learn how to use \LaTeX, this is a great opportunity to do so
+ An easy way to get introduced to this is to use R Markdown within R Studio
+ Make sure you have TeX installed, which you can find [here](https://www.latex-project.org/get/#distributions)
+ Make sure you have R Markdown installed using `install.packages("rmarkdown")`
+ Now in R Studio, choose `File` $\rightarrow$ `New File` $\rightarrow$ `R Markdown`
# Logistics
8. Using R Markdown
+ Choose to compile your document as a PDF or HTML file and give it a title
+ Now you will be given a template
+ Embed your code within \begin{verbatim}```{r}\end{verbatim} and \begin{verbatim}```\end{verbatim} and write up your text outside
+ Then press `Knit` and it will produce a PDF or HTML document with your code, R output, and text nicely formatted
+ Please try to complete your homeworks in this way
# Questions
# Time Series Data - Unemployment in Maine
\scriptsize
```{r}
# Acquire the data
# Monthly unemployment in Maine from January 1996 to August 2006
www <- "http://students.washington.edu/dhyoo/Maine.dat"
Maine.month <- read.table(www, header = TRUE)
# Attach the object and check its class
attach(Maine.month)
class(Maine.month)
#Monthly unemployment data
head(Maine.month)
```
# Time Series Data - Unemployment in Maine
\scriptsize
```{r}
# Create a time series object
help(ts)
Maine.month.ts <- ts(unemploy, start = c(1996, 1), freq = 12)
Maine.month.ts
```
# Time Series Data - Unemployment in Maine
\scriptsize
```{r}
# Find the mean unemployment per year
Maine.annual.ts <- aggregate(Maine.month.ts)/12
Maine.annual.ts
```
# Time Series Data - Unemployment in Maine
\tiny
```{r}
# Plot the time series. Intuitively, how would you describe the pattern of unemployment?
layout(1:2)
plot(Maine.month.ts, ylab="unemployed (%)")
plot(Maine.annual.ts, ylab="unemployed (%)")
```
# Time Series Data - Unemployment in Maine
\scriptsize
```{r}
# Find unmployment rates for Feburary and August
Maine.Feb <- window(Maine.month.ts, start = c(1996,2), freq = TRUE)
Maine.Aug <- window(Maine.month.ts, start = c(1996,8), freq = TRUE)
# Find ratio of mean unemployment in Feb and August versus grand mean
Feb.ratio <- mean(Maine.Feb) / mean(Maine.month.ts)
Aug.ratio <- mean(Maine.Aug) / mean(Maine.month.ts)
Maine.Feb
Feb.ratio
Aug.ratio
```
# Time Series Data - Global Temperature
\scriptsize
```{r}
# Acquire the data
www <- "http://students.washington.edu/dhyoo/global.dat"
# Average global temperature from Univ. East Anglia and UK Met Office
# Monthly from January 1856 to December 2005
Global <- scan(www)
```
1. Create a time series object using the data that starts in Jan 1856 and ends in Dec 2005 with monthly observations.
\newline
2. Find the mean temperature for each year and save in a new time series object.
\newline
3. Plot the two objects.
\newline
4. Observe global temperature from 1970 to 2005 using the window function and plot.
# Time Series Data - Global Temperature
\scriptsize
```{r}
# Create a time series object
Global.ts <- ts(Global, st = c(1856, 1), end = c(2005, 12), fr = 12)
head(Global.ts)
# Find the mean temperature for each year
Global.annual <- aggregate(Global.ts, FUN = mean)
head(Global.annual)
```
# Time Series Data - Global Temperature
\scriptsize
```{r}
# Plot the time series.
# How would you describe the pattern in global temperature?
plot(Global.ts)
plot(Global.annual)
```
# Time Series Data - Global Temperature
\scriptsize
```{r}
plot(Global.annual)
```
# Time Series Data - Global Temperature
\scriptsize
```{r}
# Observe between 1970 and 2005 only
New.series <- window(Global.ts, start=c(1970, 1), end=c(2005, 12))
# Express each month fractionally
New.time <- time(New.series)
```
# Time Series Data - Global Temperature
\tiny
```{r}
# How would you describe this pattern?
plot(New.series); abline(reg=lm(New.series ~ New.time))
```
# Multiple Time Series - Electricty, Beer, Chocolate Production
\scriptsize
```{r}
# Acquire the data
www <- "http://students.washington.edu/dhyoo/cbe.dat"
# Electricity (millions of kWh), beer (Ml), and chocolate production (tonnes)
# in Australia from January 1958 to December 1990
# from the Australian Bureau of Statistics
CBE <- read.table(www, header=T)
CBE[1:4,]
class(CBE)
```
# Multiple Time Series - Electricty, Beer, Chocolate Production
\scriptsize
```{r}
# Create separate time series objects for each
Elec.ts <- ts(CBE[, 3], start = 1958, freq = 12)
Beer.ts <- ts(CBE[, 2], start = 1958, freq = 12)
Choc.ts <- ts(CBE[, 1], start = 1958, freq = 12)
```
# Multiple Time Series - Electricty, Beer, Chocolate Production
\tiny
```{r}
plot(cbind(Elec.ts, Beer.ts, Choc.ts))
```
# Panel Data - Democracy and Income
\scriptsize
```{r}
library(foreign)
library(tidyverse)
library(ggplot2)
setwd("/Users/danielyoo/CSSS-POLS-512/Labs")
data<-read.csv("Lab1data.csv", header=T)
#Democracy and income data from 174 countries from 2000 to 2010
```
# Panel Data - Democracy and Income
\scriptsize
```{r}
head(unique(data$country)) # observations on 174 countries
head(tapply(data$country, data$Year, length))
head(tapply(data$Year, data$country, length))
```
# Panel Data - Democracy and Income
\tiny
```{r}
p <- ggplot(data = na.omit(data), aes(x = Year, y = GDP.per.capita.PPP.current.international,
group=country, color=country))
p + geom_line(alpha=0.5) + guides(color=FALSE)
```
# Panel Data - Democracy and Income
\emph{Some wrangling exercises}:
1. Subset the data frame to show only country name and GDP per capita
2. Rearrange the columns of the data frame ascending by polity score
3. Show only values of GDP per capita for South Africa from 2002 to 2008
4. Create a new variable that takes the first letter of the country and attaches it to the year of observation
5. Find the mean of GDP per capita for each year of observation
# Panel Data - Democracy and Income
\tiny
```{r}
library(tidyverse)
head(select(data, country, GDP.per.capita.PPP.current.international))
head(data[, c(1,3)])
head(data.frame(data$country, data$GDP.per.capita.PPP.current.international))
```
# Panel Data - Democracy and Income
\scriptsize
```{r}
head(arrange(data, polity2))
head(data[order(data$polity2),])
```
# Panel Data - Democracy and Income
\tiny
```{r}
head(filter(data, country==c("South Africa"), Year>=2002 & Year<=2008))
head(subset(data, data$country==c("South Africa") & data$Year>=2002 & Year<=2008))
```
# Panel Data - Democracy and Income
\tiny
```{r}
head(mutate(data, paste(substring(data$country, 1, 1), data$Year, sep="")))
```
# Panel Data - Democracy and Income
\scriptsize
```{r}
data%>%
group_by(Year)%>%
summarize(mean(GDP.per.capita.PPP.current.international, na.rm=T)
)
```