Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences in F-scores with Stata when using clustering #4

Open
visez opened this issue Mar 6, 2020 · 2 comments
Open

Differences in F-scores with Stata when using clustering #4

visez opened this issue Mar 6, 2020 · 2 comments

Comments

@visez
Copy link

visez commented Mar 6, 2020

It seems that the largest discrepancies between the Stata outputs and econtools are when the clustering option is used. On my dataset, I get perfect replicability of Stata for the command:

areg y X, absorb(alpha)

However, differences emerge in t and F values for the line

areg y X, absorb(alpha) cluster(alpha)

on the same dataset.

@dmsul
Copy link
Owner

dmsul commented Mar 11, 2020

Thanks for the heads up! It's probably a degrees of freedom issue. I'll look into it when I get the chance.

@vikjam
Copy link

vikjam commented Dec 11, 2020

Hi! Thanks for writing econtools. To make it easier to identify the discrepancy, I've written a .do file and .py file below that demonstrates different output from areg, xtreg, reghdfe and econtools. I've also shown how to reconcile the differences (though there still seems to be a small difference with econtools).

The key difference appears to be the finite-sample modifications. In particular, a discrepancy arises when the clusters are nested within the fixed-effects. This is discussed in the reghdfe FAQ.

xtreg and reghdfe appear to be identical in this case. econtools appears to be very close to xtreg/reghdfe. Based on my understanding, the results from xtreg/reghdfe/econtools are preferable to areg when clusters are nested within the fixed-effects.

econtools_example.do

use "https://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.dta"
save "test_data.dta", replace

replace x = x / 100

*------*
* areg *
*------*
areg y x, absorb(firmid) vce(cluster firmid)
matrix define areg_V = e(V)

*-------*
* xtreg *
*-------*
xtset firmid
xtreg y x, fe vce(cluster firmid)
matrix define xtreg_V = e(V)

xtreg y x, fe vce(cluster firmid) dfadj
matrix define xtreg_dfadj_V = e(V)

*---------*
* reghdfe *
*---------*
reghdfe y x, absorb(firmid) vce(cluster firmid)
matrix define reghdfe_V = e(V)
local G = `e(N_clust)'
local N = `e(N_full)'
local K = `e(rank)'

matrix areg_to_reghdfe_V = areg_V * (`N' - `K' - `G') / (`N' - `K' - 1)

*-----------*
* econtools *
*-----------*
matrix areg_to_econtools_V = areg_V * (`N' - `K' - `G') / (`N' - `K') 

* areg
matrix list areg_V
* symmetric areg_V[2,2]
*                 x       _cons
*     x   100950.97
* _cons  -.05422367   2.913e-08

* xtreg
matrix list xtreg_V
* symmetric xtreg_V[2,2]
*                 x       _cons
*     x   90872.034
* _cons  -.04880999   2.622e-08

* xtreg with dfadj => areg
matrix list xtreg_dfadj_V
* symmetric xtreg_dfadj_V[2,2]
*                 x       _cons
*     x   100950.97
* _cons  -.05422367   2.913e-08

* reghdfe
matrix list reghdfe_V
* symmetric reghdfe_V[2,2]
*                 x       _cons
*     x   90872.034
* _cons  -.04880999   2.622e-08

* convert areg to reghdfe
matrix list areg_to_reghdfe_V
* symmetric areg_to_reghdfe_V[2,2]
*                 x       _cons
*     x   90872.034
* _cons  -.04880999   2.622e-08

* convert areg to econtools
matrix list areg_to_econtools_V
* symmetric areg_to_econtools_V[2,2]
*                 x       _cons
*     x   90853.856
* _cons  -.04880022   2.621e-08

econtools_example.py

import pandas as pd
import econtools
import econtools.metrics as mt

# Read Stata .dta file
test_data = econtools.read("test_data.dta")
test_data["x"] *= 1 / 100

# Estimate OLS regression with fixed-effects and clustered s.e.'s
result = mt.reg(test_data, "y", "x", fe_name="firmid", cluster="firmid")

print(result.vce)
#             x
# x  90853.85922

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants