Lab1.Rmd

---
title: 'CSSS 512: Lab 1'
output:
  beamer_presentation
date: "2018-3-30"
subtitle: Logistics & R Refresher
---

# Agenda

1. **Logistics**
    + Labs, Office Hours, Homeworks
    + Goals and Expectations
    + $\textsf{R}$, $\textsf{R}$ Studio, $\textsf{R}$ Markdown, \LaTeX 
\newline
2. **Time Series Data in R**
    + [Unemployment in Maine](http://www.maine.gov/labor/cwri/cps.html)
    + [Global Temperature](https://crudata.uea.ac.uk/cru/data/temperature/)
    + [Electricity, Beer, and Chocolate Production](http://www.abs.gov.au/)
\newline
3. **Panel Data in R**
    + Democracy and Income
    + Data wrangling

# Logistics

1. **Lab Sessions**: Fri, 1:00-2:20pm in Savery 117
    + Covers application of material from lecture using examples; clarification and extention of lecture material; Q & A for homeworks and lectures
    + Materials will be available on the [**course website**](http://faculty.washington.edu/cadolph/?page=24) 

2. **Office Hours**: Tues, 3:00-4:20pm in Smith 220
    + Available for trouble shooting and specific questions about homework and lecture materials
    
3. **Homeworks**: 3-4 due every 2 weeks or so
    + Ideally, done using $\textsf{R}$ or $\textsf{R}$ Studio with write up in \LaTeX 
    + Using $\textsf{R}$ Studio with $\textsf{R}$ Markdown is an easy way to do this 
    + Many packages: `tseries`, `forecast`, `lmtest`, `urca`, `quantmod`, etc.

# Logistics

4. When this course is over, you should be able to do the following (and more):
    + Identify and understand time series dynamics: seasonality, deterministic trends, moving average processes, autoregressive processes
    + Distinguish between stationary and nonstationary time series, perform unit root tests, fit ARMA and ARIMA models, use cross validation for model assessment
    + Analyze multiple continuous time series using vector autoregression, perform cointegration tests, and estimate error correction models for cointegrated time series
    + Distinguish between random effects, fixed effects, and mixed effects and decide when each of these are appropriate
    + Understand Nickell bias and use an instrumental variable approach with GMM to address the issue 
    + Perform multiple imputation and in-sample simulations for panel data
    
$\newline$

# Logistics

5. The course moves fast: you should be comfortable doing the following for the homework assignments and project
    + tidying and transforming data, especially time series and panel data
    + importing and exporting data sets 
    + generating plots of your data and results
    + writing basic functions and loops for repeated procedures

* Fortunately, for those of you new to $\textsf{R}$, there are many resources to get you up to speed
    + Cowpertwait and Metcalfe (2009) - download via UW library
    + [Zuur et al. (2009)](https://canvas.uw.edu/courses/1064065/files)
    + [Wickham and Groleman (2017)](http://r4ds.had.co.nz/)


# Logistics
6. Please make sure that you have R or R Studio installed on your computer

7. If you would like to learn how to use \LaTeX, this is a great opportunity to do so
    + An easy way to get introduced to this is to use R Markdown within R Studio
    + Make sure you have TeX installed, which you can find [here](https://www.latex-project.org/get/#distributions)
    + Make sure you have R Markdown installed using `install.packages("rmarkdown")`
    + Now in R Studio, choose `File` $\rightarrow$ `New File` $\rightarrow$ `R Markdown`
    
# Logistics

8. Using R Markdown
    + Choose to compile your document as a PDF or HTML file and give it a title
    + Now you will be given a template
    + Embed your code within \begin{verbatim}```{r}\end{verbatim} and \begin{verbatim}```\end{verbatim} and write up your text outside
    + Then press `Knit` and it will produce a PDF or HTML document with your code, R output, and text nicely formatted 
    + Please try to complete your homeworks in this way

# Questions

# Time Series Data - Unemployment in Maine
\scriptsize
```{r}
# Acquire the data
#	Monthly unemployment in Maine from January 1996 to August 2006
www <- "http://students.washington.edu/dhyoo/Maine.dat"
Maine.month <- read.table(www, header = TRUE)

# Attach the object and check its class
attach(Maine.month)
class(Maine.month)

#Monthly unemployment data 
head(Maine.month)

```


# Time Series Data - Unemployment in Maine
\scriptsize
```{r}
# Create a time series object
help(ts)
Maine.month.ts <- ts(unemploy, start = c(1996, 1), freq = 12)
Maine.month.ts

```

# Time Series Data - Unemployment in Maine
\scriptsize
```{r}
# Find the mean unemployment per year
Maine.annual.ts <- aggregate(Maine.month.ts)/12
Maine.annual.ts
```


# Time Series Data - Unemployment in Maine
\tiny
```{r}
# Plot the time series. Intuitively, how would you describe the pattern of unemployment?
layout(1:2)
plot(Maine.month.ts, ylab="unemployed (%)")
plot(Maine.annual.ts, ylab="unemployed (%)")
```


# Time Series Data - Unemployment in Maine
\scriptsize
```{r}
# Find unmployment rates for Feburary and August
Maine.Feb <- window(Maine.month.ts, start = c(1996,2), freq = TRUE)
Maine.Aug <- window(Maine.month.ts, start = c(1996,8), freq = TRUE)
# Find ratio of mean unemployment in Feb and August versus grand mean
Feb.ratio <- mean(Maine.Feb) / mean(Maine.month.ts)
Aug.ratio <- mean(Maine.Aug) / mean(Maine.month.ts)

Maine.Feb

Feb.ratio
Aug.ratio


```


# Time Series Data - Global Temperature
\scriptsize
```{r}
# Acquire the data
www <- "http://students.washington.edu/dhyoo/global.dat"
# Average global temperature from Univ. East Anglia and UK Met Office
# Monthly from January 1856 to December 2005
Global <- scan(www)

```

1. Create a time series object using the data that starts in Jan 1856 and ends in Dec 2005 with monthly observations.
\newline

2. Find the mean temperature for each year and save in a new time series object.
\newline

3. Plot the two objects.
\newline

4. Observe global temperature from 1970 to 2005 using the window function and plot.

# Time Series Data - Global Temperature
\scriptsize
```{r}
# Create a time series object
Global.ts <- ts(Global, st = c(1856, 1), end = c(2005, 12), fr = 12)
head(Global.ts)

# Find the mean temperature for each year
Global.annual <- aggregate(Global.ts, FUN = mean)
head(Global.annual)
```


# Time Series Data - Global Temperature
\scriptsize
```{r}
# Plot the time series. 
# How would you describe the pattern in global temperature?
plot(Global.ts)
plot(Global.annual)

```


# Time Series Data - Global Temperature
\scriptsize
```{r}

plot(Global.annual)

```


# Time Series Data - Global Temperature
\scriptsize
```{r}
# Observe between 1970 and 2005 only
New.series <- window(Global.ts, start=c(1970, 1), end=c(2005, 12))

# Express each month fractionally
New.time <- time(New.series)

```

# Time Series Data - Global Temperature
\tiny
```{r}
# How would you describe this pattern?
plot(New.series); abline(reg=lm(New.series ~ New.time))

```


# Multiple Time Series - Electricty, Beer, Chocolate Production
\scriptsize
```{r}
# Acquire the data
www <- "http://students.washington.edu/dhyoo/cbe.dat"
#	Electricity (millions of kWh), beer (Ml), and chocolate production (tonnes) 
# in Australia from January 1958 to December 1990 
# from the Australian Bureau of Statistics

CBE <- read.table(www, header=T)

CBE[1:4,]

class(CBE)


```


# Multiple Time Series - Electricty, Beer, Chocolate Production
\scriptsize
```{r}
# Create separate time series objects for each
Elec.ts <- ts(CBE[, 3], start = 1958, freq = 12)

Beer.ts <- ts(CBE[, 2], start = 1958, freq = 12)

Choc.ts <- ts(CBE[, 1], start = 1958, freq = 12)

```


# Multiple Time Series - Electricty, Beer, Chocolate Production
\tiny
```{r}
plot(cbind(Elec.ts, Beer.ts, Choc.ts))


```

# Panel Data - Democracy and Income
\scriptsize
```{r}

library(foreign)
library(tidyverse)
library(ggplot2)

setwd("/Users/danielyoo/CSSS-POLS-512/Labs")
data<-read.csv("Lab1data.csv", header=T)  
#Democracy and income data from 174 countries from 2000 to 2010


```


# Panel Data - Democracy and Income
\scriptsize
```{r}

head(unique(data$country)) # observations on 174 countries
head(tapply(data$country, data$Year, length))
head(tapply(data$Year, data$country, length))


```

# Panel Data - Democracy and Income
\tiny
```{r}
p <- ggplot(data = na.omit(data), aes(x = Year, y = GDP.per.capita.PPP.current.international, 
                                      group=country, color=country))
p + geom_line(alpha=0.5) + guides(color=FALSE) 
```


# Panel Data - Democracy and Income

\emph{Some wrangling exercises}:

1. Subset the data frame to show only country name and GDP per capita

2. Rearrange the columns of the data frame ascending by polity score

3. Show only values of GDP per capita for South Africa from 2002 to 2008

4. Create a new variable that takes the first letter of the country and attaches it to the year of observation

5. Find the mean of GDP per capita for each year of observation


# Panel Data - Democracy and Income
\tiny
```{r}

library(tidyverse)
head(select(data, country, GDP.per.capita.PPP.current.international))
head(data[, c(1,3)])
head(data.frame(data$country, data$GDP.per.capita.PPP.current.international))


```


# Panel Data - Democracy and Income
\scriptsize
```{r}

head(arrange(data, polity2))
head(data[order(data$polity2),])

```


# Panel Data - Democracy and Income
\tiny
```{r}

head(filter(data, country==c("South Africa"), Year>=2002 & Year<=2008))
head(subset(data, data$country==c("South Africa") & data$Year>=2002 & Year<=2008))


```


# Panel Data - Democracy and Income
\tiny
```{r}

head(mutate(data, paste(substring(data$country, 1, 1), data$Year, sep="")))


```


# Panel Data - Democracy and Income
\scriptsize
```{r}

data%>%
  group_by(Year)%>%
  summarize(mean(GDP.per.capita.PPP.current.international, na.rm=T)
  )

```