-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up emfx #19
Speed up emfx #19
Conversation
speeds up etfx considerably, while results remain the same
emfx becomes way faster for large data sets if we first collapse the data
Super, thanks Frederic. I've been too busy to give this my full attention, but I'm hoping to get around to it later this week. Speeding up the
Excited to take a proper look. |
Dear Grant, Best, Frederic |
@frederickluser We've got some conflicts that need to be resolved due to the compatibility updates with marginaleffects 0.9.0 (#20). Feel free to have a crack at these, otherwise I'll try as soon as I get a chance. Goal is to submit to CRAN by the end of the day (Pacific Time). |
Hi @frederickluser I've resolved most of the conflicts, but there are bunch of test failures on the Consider the > simple_known[, c("estimate", "xvar")]
estimate xvar
1 -0.004358929 1
2 -0.077394771 2
3 -0.121365807 3
> as.data.frame(emfx(x3, xvar = "xvar"))[, c("estimate", "xvar")] |> dplyr::arrange(xvar)
estimate xvar
1 -0.01221530 1
2 -0.06043789 2
3 -0.09087307 3 Do you know why we're getting different estimates here? I'm using the same seed as the original file ( Can you please check against some known output on your side. I'm worried that the test results are changing through these different PRs... |
Hey Grant. I messed up the check somehow, sorry. The problem was that the The difference you've shown before is just the marginal difference between collapsing and using the full data. |
Excellent. Thanks for fixing and for this nice pull request! |
Heads-up that I've tweaked these functionality a bit (e.g. in ea3bb3e). Key user-level changes are:
|
Hey Grant,
I added now the option
collapse_data
toemfx
, which will significantly improve running time for large data sets.If
collapse_data = T
, then the data will be collapsed usingdata.table
before computing marginal effects and then weighted by group-cohort sizes in themarginaleffects
command itself. The results are (almost) identical to the case without collapsing. Ifivar
is not NULL inetwfe,
collapsing is not possible and a warning message appears.I updated the tinytests and the checks produced no errors. Do you think this would be a sensible merge? You're welcome to request changes.
Best, Frederic