-
Notifications
You must be signed in to change notification settings - Fork 36
/
lab3.Rmd
120 lines (99 loc) · 4.57 KB
/
lab3.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
title: "In-class Lab 3"
author: "ECON 4223 (Prof. Tyler Ransom, U of Oklahoma)"
date: "January 27, 2022"
output:
pdf_document: default
html_document:
df_print: paged
word_document: default
bibliography: biblio.bib
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, results = 'hide', fig.keep = 'none')
```
The purpose of this in-class lab is to practice running regressions, computing regression formulas, visualizing the Sample Regression Function, using non-linear transformations, and interpreting coefficients. You may complete this lab as a group, but please turn in separate copies for each group member. To get credit, upload your .R script to the appropriate place on Canvas.
## For starters
Open up a new R script (named `ICL3_XYZ.R`, where `XYZ` are your initials) and add the usual "preamble" to the top:
```{r message=FALSE, warning=FALSE, paged.print=FALSE}
library(tidyverse)
library(modelsummary)
library(broom)
library(wooldridge)
```
For this lab, let's use data on school expenditures and math test pass rates from Michigan. This is located in the `meap93` data set in the `wooldridge` package. Each observation is a school district in Michigan.
```{r}
df <- as_tibble(meap93)
```
## The Relationship between Expenditures and Math Test Pass Rates
Estimate the following regression model:
\[
math10 = \beta_0 + \beta_1 expend + u
\]
The code to do so is:
```{r}
est <- lm(math10 ~ expend, data=df)
tidy(est)
glance(est)
```
You should get a coefficient of `0.00246` on `expend`. Interpret this coefficient. (You can type the interpretation as a comment in your .R script.) Is this number small, given the units that `math10` and `expend` are in?
## Regression Coefficients "By Hand"
Verify that the regression coefficients in `est` are the same as the formulas from the book:
$$
\hat{\beta}_0 = \overline{math10} - \hat{\beta}_1 \overline{expend}, \\
\hat{\beta}_1 = \frac{\widehat{cov}(math10,expend)}{\widehat{var}(expend)}
$$
You can do this by typing:
```{r}
beta1 <- cov(df$math10,df$expend)/var(df$expend)
beta0 <- mean(df$math10)-beta1*mean(df$expend)
```
## Visualizing Regression Estimates
Often, it's helpful to visualize the estimated regression model. @wooldridge calls this the "Sample Regression Function." We can do this with the following code:
```{r}
ggplot(df,aes(expend,math10)) +
geom_point() +
geom_smooth(method='lm')
```
## Nonlinear transformations
Let's consider a modified version of our model, where now we use *log* expenditures instead of expenditures. Why might we want to use log expenditures? Likely because we think that each additional dollar spent *doesn't* have an equal effect on pass rates. That is, additional dollars spent likely have diminishing effects on pass rates. (See also: the Law of Diminishing Marginal Returns)
Create the log expenditures variable using `mutate()`:
```{r}
df <- df %>% mutate(logexpend = log(expend))
```
Now estimate your model again and re-do the visualization (showing both functional forms together):
```{r}
est <- lm(math10 ~ logexpend, data=df)
tidy(est)
glance(est)
modelsummary(est)
ggplot(df, aes(expend,math10)) +
geom_point() +
stat_smooth(method = "lm", col = "red" ,se=F,formula = y~log(x)) +
stat_smooth(method = "lm", col = "blue",se=F)
```
What is the interpretation of $\beta_1$ in this new model? (Add it as a comment in your R script)
## Standard Errors and Regression Output
Finally, we can look at the standard error, t-statistic, and p-values associated with our regression parameters $\beta_0$ and $\beta_1$. The `p.value` reported in `tidy(est)` tests the following hypothesis:
\[
H_0: \beta_1 = 0, \\
H_a: \beta_1 \neq 0
\]
Does increased school spending significantly increase the math test pass rate?
## Computing standard errors by hand
If you have extra time, try computing the standard error formulas by hand, according to the formulas in the text book. To do so, we need to compute the following formulas: `sig` (the standard deviation of $u$), `n` (our regression's sample size), `SSTx` ($N-1$ times the variance of `logexpend`), and the sum of the squares of `logexpend`:
```{r}
n <- dim(df)[1]
sig <- sqrt( sum (est$residuals^2) /(n-2) ) # or, more simply, glance(est)$sigma
SSTx <- (n-1)*var(df$logexpend)
sumx2 <- sum(df$logexpend^2)
```
The standard error of the intercept is computed with the following formula:
```{r}
sqrt((sig^2*(1/n)*sumx2)/SSTx)
```
And the standard error of the slope coefficient (`logexpend` in this case) is:
```{r}
sqrt(sig^2/SSTx)
```
# References