-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference using fixest's aggregate command #33
Comments
This is great, thanks Jake. The problem with library(etwfe)
data("mpdta", package="did")
mpdta$emp = exp(mpdta$lemp)
pmod = etwfe(
fml = emp ~ lpop,
tvar = year,
gvar = first.treat,
data = mpdta,
vcov = ~countyreal,
family = "poisson"
)
#> The variables '.Dtreat:first.treat::2006:year::2004', '.Dtreat:first.treat::2006:year::2005' and eight others have been removed because of collinearity (see $collin.var).
emfx(pmod)
#>
#> Term Contrast .Dtreat Estimate Std. Error z Pr(>|z|)
#> .Dtreat mean(TRUE) - mean(FALSE) TRUE -28.6 18.4 -1.55 0.12
#> 2.5 % 97.5 %
#> -64.6 7.49
#>
#> Columns: term, contrast, .Dtreat, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo
aggregate(pmod, c("ATT"="^.Dtreat:first.treat::[0-9]{4}:year::[0-9]{4}$"))
#> Estimate Std. Error t value Pr(>|t|)
#> ATT -0.05247523 0.01206096 -4.350834 1.356206e-05 Created on 2023-04-26 with reprex v2.0.2 But it's definitely an attractive option for big linear models. Let me think about some internal ode logic. |
Hi both - one other big advantage of moving to |
Hi @jtorcasso, here's a first attempt of wild cluster bootstrap and etwfe interoperatibility via library(devtools)
install_github("https://github.com/s3alfisc/fwildclusterboot/tree/etwfe-support")
# this should install kyle's fork of fixest, if not, do it manually
#install_github("https://github.com/kylebutts/fixest/tree/sparse-matrix")
library(etwfe)
library(fwildclusterboot)
data("mpdta", package="did")
mod = etwfe(
fml = lemp ~ lpop,
tvar = year,
gvar = first.treat,
data = mpdta,
vcov = ~countyreal,
ssc = fixest::ssc(adj = FALSE, cluster.adj = FALSE)
)
emfx(mod)
# Term Contrast .Dtreat Estimate Std. Error z Pr(>|z|) S 2.5 %
# .Dtreat mean(TRUE) - mean(FALSE) TRUE -0.0506 0.0124 -4.08 <0.001 14.4 -0.075
# 97.5 %
# -0.0263
aggregate(
mod, c("ATT"="^.Dtreat:first.treat::[0-9]{4}:year::[0-9]{4}$")
)
# Estimate Std. Error t value Pr(>|t|)
# ATT -0.05062703 0.0124121 -4.078845 5.267553e-05
boot_aggregate(
mod,
B = 999,
agg = c("ATT"="^.Dtreat:first.treat::[0-9]{4}:year::[0-9]{4}$"),
clustid = ~countyreal,
ssc = boot_ssc(adj = FALSE, cluster.adj = FALSE)
)
# Run the wild bootstrap: this might take some time...(but
# hopefully not too much time =) ).
# |======================================================| 100%
# Estimate t value Pr(>|t|) [0.025% 0.975%]
# [1,] -0.05062703 -4.078845 0.001001001 -0.07420141 -0.02634166
# Warning message:
# Matrix inversion failure: Using a generalized inverse instead. Check the produced
# t-statistic, does it match the one of your regression package (under the same
# small sample correction)? If yes, this is likely not something to worry about. Note that the non-bootstrapped t-statistics from Best, Alex |
Hi, I am following up on this thread to see if there were any updates or recommendations to aggregate estimates for event studies more efficiently. I'm working with several datasets, one of which is nearly 50 million observations and 35 periods, and the current marginaleffects setup with emfx takes so long its impossible to generate meaningful summary results. (This is driven by the SE estimation). I need SEs for plots, so turning off vcov isn't really an option. Any help or suggestions would be greatly appreciated! |
@poetlarsen Bootstrap is your friend on this one with |
Or just using "aggregate" as described above should also work, unless you run a Poisson regression? |
+1 to @kylebutts and @s3alfisc's comments. OTOH—and apart from the non-linear family issue—the other tricky thing with the Summarising, I think we can use ## estimate your model
mod = etwfe(...)
## Simple ATT
# emfx(mod) ## for comparison
aggregate(mod, c("ATT" = "^\\.Dtreat(?:(?!_dm$).)*$"))
# For more complex ATTs we'll need the input gvar and tvar
gvar = attr(mod, "etwfe")[["gvar"]]
tvar = attr(mod, "etwfe")[["tvar"]]
## Group ATTs
# emfx(mod, "group") ## for comparison
aggregate(
mod,
paste0("(", gvar, "::[[:digit:]]+)(?:(?!_dm$).)*$")
)
## Calendar ATTs
# emfx(mod, "calendar") ## for comparison
aggregate(
mod,
paste0("(", tvar, "::[[:digit:]]+)((?:(?!_dm$).)*$)")
)
## Event ATTs (??)
# emfx(mod, "event")) ## For comparison
## ??
# aggregate(
# mod,
# paste0("(", gvar, ".*", tvar, ")((?:(?!_dm$).)*$)")
# ) Considering all of this, @poetlarsen it might be worth exploring whether another of the "modern" estimators isn't better suited to your use case. The fastest option for that many observations is almost certainly going to be sunab, although I suspect that did2s will give a good account itself too. |
Fixest's aggregate command may be an efficient alternative for inference in cases when
marginaleffects
takes a long time. Consider the MWE below:emfx
returns an estimate of-0.05062703
with std. error of0.01249979
, andaggregate
returns the same exact values in this case.The text was updated successfully, but these errors were encountered: