Skip to content

Commit 207caac

Browse files
authored
Merge pull request #8 from sarahzeller/chapter12-13
Add chapter 12/13
2 parents 9c02d1d + 07dc6c9 commit 207caac

19 files changed

+549
-22
lines changed

12_opening-the-toolbox.Rmd

+8-5
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,16 @@
11
# Opening the Toolbox
22

3-
**Learning objectives:**
3+
## Methods that we'll be checking out
44

5-
- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY
5+
- regression as baseline for everything
6+
- focus on "template" research designs: same DAG applies to many settings
67

7-
## SLIDE 1 {-}
8+
## Structure of upcoming chapters
9+
10+
- How does it work?
11+
- How is it performed?
12+
- How the pros do it (non-exhaustive): caveats and extensions
813

9-
- ADD SLIDES AS SECTIONS (`##`).
10-
- TRY TO KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF.
1114

1215
## Meeting Videos {-}
1316

13_regression.Rmd

+343-8
Original file line numberDiff line numberDiff line change
@@ -2,23 +2,358 @@
22

33
**Learning objectives:**
44

5-
- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY
5+
- review of basic regressions
6+
- how to incorporate non-continuous variables into OLS
7+
- how to incorporate non-linear relationships into OLS
68

7-
## SLIDE 1 {-}
9+
## Basics
810

9-
- ADD SLIDES AS SECTIONS (`##`).
10-
- TRY TO KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF.
11+
> Regression is the most common way in which we fit a line to explain variation.
1112
12-
## Meeting Videos {-}
13+
- also used for causal effects: closing back doors (controlling)
14+
- use the values of one variable ($X$) to *predict* the values of another ($Y$)
15+
- one way: fit a line that describes the relationship
16+
- interpretation of coefficient: slope
17+
- plugging prediction in, we get prediction: $\hat{Y}$
18+
- difference between $Y$ and $\hat{Y}$ is the residual
19+
- we can make the line curvy by adding polynomials (i.e. $\beta_1 X + \beta_2 X^2$)
1320

14-
### Cohort 1 {-}
21+
## Error terms
22+
23+
> There's going to be a difference between the line that we fit and the observation we get.
24+
25+
- **Error**: theoretical, difference between the actual outcome and prediction we'd make if we had infinite observations to estimate our prediction (true best-fit line)
26+
- **Residual**: observed, difference between the actual outcome and the prediction
27+
28+
![](images/ch13_regression/statisticaladjustment-errorres-1.png)
29+
30+
- error contains everything that causes Y that is not included in the model
31+
- if our model is $Y = \beta_0 + \beta_1 X + \epsilon$, then what's included in $epsilon$?
32+
33+
![](images/ch13_regression/statisticaladjustment-epsilondag-1.png)
34+
35+
## Regression assumptions
36+
37+
**exogeneity assumption**: In a regression context, the assumption that the variables in the model (or perhaps just our treatment variable) is uncorrelated with the error term
38+
39+
- basically the same conditions as for identifying a causal effect
40+
- if something is still in the error term and is correlated with $X$, we haven't closed this path
41+
- other way to say this: $X$ is correlated with $\epsilon$ and so is endogenous
42+
- endogeneity problem: bias --\> estimate gives the wrong answer on average
43+
- here: omitted variable bias
44+
45+
## Sampling variation
46+
47+
> Regression coefficients also follow a normal distribution
48+
49+
standard error of sampling distribution: $\sqrt{\frac{\sigma^2}{var(X) n}}$
50+
51+
Only three things can change:
52+
53+
- shrink the standard deviation of the error term $\sigma$
54+
- pick an $X$ that varies a lot
55+
- pick a larger sample, i.e. increase $n$
56+
57+
## Hypothesis testing in OLS
58+
59+
author strongly dislikes it, since choice of rejection value is arbitrary and sharp
60+
61+
1. Pick a theoretical distribution
62+
2. Estimate $\beta_1$ using OLS in observed data: $\hat{\beta_1}$
63+
3. Use that theoretical distribution to see how unlikely it would be to get $\hat{\beta_1}$
64+
4. If it's super unlikely, that initial value is probably wrong
65+
66+
Alternative: hpyothesis testing
67+
68+
1. Pick null hypothesis (typically $\beta_1 = 0$)
69+
2. Pick rejection value $\alpha$
70+
3. Check probability against rejection value
71+
4. Possibly reject null: we think it's unlikely that the value is 0.
72+
73+
![](images/ch13_regression/statisticaladjustment-theoreticaldist-1.png)
74+
75+
- Type I error rate ("false positive rate"): rejection of something that's true
76+
77+
- Type II error rate ("false negative rate"): not rejecting something that's false
78+
79+
- p-value: double percentile (2-sided test)
80+
81+
- t-statistic: $\frac{\hat{\beta_1}}{se(\hat{\beta_1})}$ to use with standard normal distribution
82+
83+
## Mantras about hypothesis testing
84+
85+
1. An estimate not being statistically significant doesn't mean it's wrong.
86+
2. Don't change your results to get something significant.
87+
3. A significance test isn't the last word on a result.
88+
4. Significant $\neq$ meaningful
89+
90+
## Regression tables
91+
92+
![](images/ch13_regression/regression-table.PNG)
93+
94+
- each column represents a different regression
95+
- parentheses: usually standard errors (sometimes t-statistics)
96+
- significance stars –\> p-values (author isn't a fan)
97+
- below: descriptions of analysis/measures of model quality
98+
- adjusted $R^2$ consideres number of variables
99+
- $F$-statistic: null: all the coefficients in the model are all zero
100+
- RMSE: estimate of the standard deviation of the error term
101+
102+
> What can we do with all of these model-quality measures? Take a quick look, but in general don't be too concerned about these.
103+
104+
> If you don't care about most of the causes of your dependent variable and are pretty sure you've included the variables in your model necessary to identify your treatment, then $R^2$ is of little importance.
105+
106+
### Interpretation
107+
108+
> A one-unit change in ...
109+
110+
### Controls
111+
112+
> The idea is that by including a control for Year, we are removing the part explained by Year, and can proceed as though the remaining estimates are comparing two inspections that effectively have the same year.
113+
114+
## Subscripts in regression equations
115+
116+
$Y_i = \beta_0 + \beta_1 X_i + \epsilon_i$
117+
118+
- $i$ -- what index the data varies across
119+
120+
## DAG to Regression
121+
122+
![](images/ch13_regression/Flowchart.png)
123+
124+
## Getting fancier
125+
126+
Change the variables that go into our model
127+
128+
- binary/discrete
129+
- polynomials
130+
- variable transformations
131+
- interactions
132+
133+
How to determine what to keep?
134+
135+
- LASSO
136+
137+
## Binary/discrete variables
138+
139+
- always leave one out
140+
- interpretation: the coefficient is the difference in the dependent variable between the left-out category and this one
141+
- significance: joint F-test
142+
143+
## Polynomials
144+
145+
![](images/ch13_regression/statisticaladjustment-parabolic-1.png)
146+
147+
> A polynomial is when you have the same variable in an equation as itself, as well as powers of itself
148+
149+
$\beta_1 X + \beta_2 X^2 + \beta_3 X^2$
150+
151+
- fit non-straight lines
152+
- must always be interpreted together (marginal effects), taking the derivative
153+
154+
Choosing the right number of polynomials:
155+
156+
![](images/ch13_regression/statisticaladjustment-tendeg-1.png)
157+
158+
- gets harder to interpret as you add more
159+
160+
- often, higher-order polynomial terms don't really do anything
161+
162+
- problem of overfitting
163+
164+
- almost never want to go beyond cubic
165+
166+
- approach 1: check out Y \~ X plot
167+
168+
- approach 2: check out residuals \~ X plot
169+
170+
- approach 3: LASSO
171+
172+
![](images/ch13_regression/statisticaladjustment-resids-1.png)
173+
174+
## Variable transformation
175+
176+
1. give the variable statistical properties that play more nicely with regressions (e.g. skew)
177+
178+
- **but** it's fine for an outlier to affect the OLS slope
179+
180+
1. linear relationship between variables
181+
182+
- $Wealth = InitialWealth \times e^{10 \times InterestRate}$
183+
- \$ln(Wealth) = ln(InitialWealth) + 10 \times ln(InterestRate) \$
184+
185+
### Options
186+
187+
- log (can't handle zeros)
188+
- $log(x+1)$
189+
- square root
190+
- asinh: $ln(x + \sqrt{x^2 + 1})$ --\> similar to log
191+
- winsorizing: cut off values
192+
- standardizing: $(X - mean(X))/ sd(X)$
193+
194+
### Interpretation of log
195+
196+
- for values close to 0 (up to .1): basically $\beta_1 \times 100\%$
197+
198+
## Interaction terms
199+
200+
always include both terms without interaction:
201+
202+
> otherwise, the coefficient on $X \times Z$ accounts for not only the interaction between the two, but also the direct effect of Z itself.
203+
204+
![](images/ch13_regression/table-interaction.PNG)
205+
206+
- interpretation with marginal effects (using derivative)
207+
- think very carefully about why you are including a given interaction term
208+
- trying interactions all willy-nilly tends to lead to false positives
209+
- be skeptical of an effect that isn't there at all for the whole sample
210+
- make sure you believe your story before you check the data
211+
212+
## Nonlinear regressions
213+
214+
- not accounting for non-linearity messes up results: range, slope
215+
216+
![](images/ch13_regression/statisticaladjustment-olslogit-1.png)
217+
218+
- usually tailored to dependent variable
219+
- binary dependent variables: usually OLS nontheless, but called *linear probability model* (LPM)
220+
- one way: *generalized linear model* (GLM): $Y = F(\beta_0 + \beta_1X)$, where $F$ is the link function and the inside the index.
221+
222+
### good link functions
223+
224+
- take any value from $-\infty$ to $\infty$
225+
- output values between 0 and 1
226+
- input increases --\> output increases
227+
- popular functions: logit, probit
228+
229+
### Interpretation
230+
231+
- use marginal effects
232+
- $\frac{\partial Pr(Y = 1)}{\partial X} = \beta_1 Pr(Y = 1) (1- Pr(Y = 1))$
233+
- but this changes with every $X$
234+
- recommendation against marginal effect at the mean
235+
- instead: average marginal effect
236+
237+
## Standard errors
238+
239+
- Standard errors can be messed up in may ways because assumptions are violated
240+
- We can account for this, though
241+
- correlated errors change the sampling distribution: mean is swingier, larger standard deviation
242+
243+
### Assumptions
244+
245+
1. error term $\epsilon$ is normally distributed --\> OLS is mostly okay with this
246+
2. error term is independent and identically distributed (iid)
247+
248+
- autocorrelation: temporal/spatial
249+
- heteroskedasticity
250+
- we have to figure out how this assumption fails
251+
252+
### Fixes (mostly sandwich estimators)
253+
254+
- heteroskedasticity: Huber-White
255+
- auto-correlation: HAC, e.g. Newey-West
256+
- geographic correlation: Conley spatial standard errors
257+
- hierarchical structure: clustered standard errors, e.g. Liang-Zenger
258+
- right level of clustering: treatment level/domain knowledge
259+
- only works for large number of clusters, \$ \>50\$; fix: wild cluster bootstrap standard errors
260+
- bootstrapped standard errors
261+
262+
### Bootstrapping
263+
264+
1. start with data set with $N$ observations
265+
2. randomly sample $N$ observations (with replacement)
266+
3. estimate statistic
267+
4. repeat many times (a couple of 1,000)
268+
5. look at distribution of estimates
269+
270+
- can be used for any statistic
271+
- need large samples
272+
- don't perform well with extreme value distributions
273+
- doesn't do well with autocorrelation
274+
275+
## Sample Weights
276+
277+
- weights are used to correct for sampling bias
278+
- procedure: weighted least squares
279+
280+
**Surveys**
281+
282+
- weights often provided
283+
- weight by inverse of probability to be included
284+
285+
**Aggregated data**
286+
287+
- some groups are larger than others
288+
- variation differs with that
289+
- frequency weighting vs. inverse variance weighting: estimate same, standard errors differ
290+
- frequency weighting
291+
- if exactly the same observation, just repeated
292+
- "a collection of independent, completely identical observations"
293+
- procedure: just replicate each observation weight-times - inverse variance weighting
294+
- aggregated data
295+
- each aggregate observation is weighted by number of observations
296+
- other application: meta-analysis
297+
298+
## Collinearity
299+
300+
> A tempting thought when you have multiple measures of the same concept is that including them all in the model will in some way let them complement each other, or add up to an effect. But in reality it just forces each of them to show the effect of the variation that each measure has that’s unrelated to the other measures.
301+
302+
- happens e.g. when including variables measuring the same latent variable
303+
- super highly correlated variables drive standard errors upwards
304+
305+
**Addressing this**
306+
307+
- dimension reduction: e.g. latent factor analysis, PCA
308+
- variance inflation factor
309+
- $VIF_j = \frac{1}{1-R^2_j}$
310+
- exclude variable if $VIF > 10$
311+
312+
## Measurement error
313+
314+
- often happens when using proxies
315+
- other possibility: actual measurement error
316+
- $X = X^* + \epsilon$
317+
318+
**Classical measurement error**
319+
320+
- error term is unrelated to latent variable
321+
- leads to attenuation ($\hat{\beta_1}$ closer to 0 than true $\beta_1$)
322+
- just treat estimate as lower bound
323+
- if measurement error in $Y$: no problem, just more stuff in error term
324+
325+
**Non-classical measurement error**
326+
327+
- error term is related to latent variable
328+
- e.g. self-reported exercising
329+
- real issue!
330+
- addressing this
331+
- Deming regression / Total Least Squares
332+
- GMM
333+
- instrumental variables: use one measurement as instrument for other measurement
334+
335+
## Penalized regression
336+
337+
- dropping some controls
338+
- $argmin_\beta \{\sum(Y - \hat{Y})^2 + \lambda F(\beta)\}$
339+
- minmize sum of squared residuals AND make $\beta$ function small
340+
- implementation: LASSO, ridge regression, elastic net regression (LASSO + ridge)
341+
- throw out variables that LASSO thinks are unimportant
342+
- watch out: standardize all variables
343+
- choose $\lambda$ as you want; higher value --\> toss out more variables
344+
345+
## Meeting Videos {.unnumbered}
346+
347+
### Cohort 1 {.unnumbered}
15348

16349
`r knitr::include_url("https://www.youtube.com/embed/URL")`
17350

18351
<details>
19-
<summary> Meeting chat log </summary>
20352

21-
```
353+
<summary>Meeting chat log</summary>
354+
355+
```
22356
LOG
23357
```
358+
24359
</details>

0 commit comments

Comments
 (0)