-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Differences in F-scores with Stata when using clustering #4
Comments
Thanks for the heads up! It's probably a degrees of freedom issue. I'll look into it when I get the chance. |
Hi! Thanks for writing The key difference appears to be the finite-sample modifications. In particular, a discrepancy arises when the clusters are nested within the fixed-effects. This is discussed in the
econtools_example.douse "https://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.dta"
save "test_data.dta", replace
replace x = x / 100
*------*
* areg *
*------*
areg y x, absorb(firmid) vce(cluster firmid)
matrix define areg_V = e(V)
*-------*
* xtreg *
*-------*
xtset firmid
xtreg y x, fe vce(cluster firmid)
matrix define xtreg_V = e(V)
xtreg y x, fe vce(cluster firmid) dfadj
matrix define xtreg_dfadj_V = e(V)
*---------*
* reghdfe *
*---------*
reghdfe y x, absorb(firmid) vce(cluster firmid)
matrix define reghdfe_V = e(V)
local G = `e(N_clust)'
local N = `e(N_full)'
local K = `e(rank)'
matrix areg_to_reghdfe_V = areg_V * (`N' - `K' - `G') / (`N' - `K' - 1)
*-----------*
* econtools *
*-----------*
matrix areg_to_econtools_V = areg_V * (`N' - `K' - `G') / (`N' - `K')
* areg
matrix list areg_V
* symmetric areg_V[2,2]
* x _cons
* x 100950.97
* _cons -.05422367 2.913e-08
* xtreg
matrix list xtreg_V
* symmetric xtreg_V[2,2]
* x _cons
* x 90872.034
* _cons -.04880999 2.622e-08
* xtreg with dfadj => areg
matrix list xtreg_dfadj_V
* symmetric xtreg_dfadj_V[2,2]
* x _cons
* x 100950.97
* _cons -.05422367 2.913e-08
* reghdfe
matrix list reghdfe_V
* symmetric reghdfe_V[2,2]
* x _cons
* x 90872.034
* _cons -.04880999 2.622e-08
* convert areg to reghdfe
matrix list areg_to_reghdfe_V
* symmetric areg_to_reghdfe_V[2,2]
* x _cons
* x 90872.034
* _cons -.04880999 2.622e-08
* convert areg to econtools
matrix list areg_to_econtools_V
* symmetric areg_to_econtools_V[2,2]
* x _cons
* x 90853.856
* _cons -.04880022 2.621e-08 econtools_example.pyimport pandas as pd
import econtools
import econtools.metrics as mt
# Read Stata .dta file
test_data = econtools.read("test_data.dta")
test_data["x"] *= 1 / 100
# Estimate OLS regression with fixed-effects and clustered s.e.'s
result = mt.reg(test_data, "y", "x", fe_name="firmid", cluster="firmid")
print(result.vce)
# x
# x 90853.85922 |
It seems that the largest discrepancies between the Stata outputs and econtools are when the clustering option is used. On my dataset, I get perfect replicability of Stata for the command:
areg y X, absorb(alpha)
However, differences emerge in t and F values for the line
areg y X, absorb(alpha) cluster(alpha)
on the same dataset.
The text was updated successfully, but these errors were encountered: