-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow missings
in the modelmatrix, necessary to use lead
/lag
in @formula
#109
Conversation
Arg, thanks for spotting the issue. I remember raising this issue a while ago (https://discourse.julialang.org/t/psa-breaking-changes-in-statsmodels-v0-6-0-terms-2-0-son-of-terms/25626/4?u=matthieu). |
In the meantime, it would be awesome if you could do a pull request that returns a better error explaining the problem. |
EDIT: I see now, that my point was wrong. One can first filter by complete cases of |
Wouldn't you think that this is a better temporary solution than a better error message? It would be nice to be able to use |
I don't think it is worth it if it makes all regressions that do not use lag slower (or unfeasible). Moreover, lag does not even work with timed variable, or within groups (for now), so I'm not sure it's that useful for the kind of things people do with the package. |
I added a commit so that regressions without lags should not be affected. It's true that the lead and lag functions in StatsModels are not very powerful yet. I extended the lag function for groups in JuliaStats/StatsModels.jl#182, where it was suggested to put it in a separate package for now: https://github.com/greimel/GroupedTemporalTerms.jl Does this address all of your concerns, or are there more? |
src/fit.jl
Outdated
esample2 = trues(size(y)) | ||
|
||
if size(Xexo, 2) > 0 | ||
esample2 .&= completecases(DataFrame(Xexo)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to do
for c in eachcol(X)
esample2 .&= .!ismissing.(c)
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in new commit
src/fit.jl
Outdated
all(isfinite, Xexo) || throw("Some observations for the exogeneous variables are infinite") | ||
|
||
Xexo = modelmatrix(formula_schema, subdf) | ||
esample2 = trues(size(y)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a comment to the issue number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in new commit, please let me know if the comments are overkill now
If you want it merged, could you answer the comments and add tests? |
This version passes a test comparing hand-lagged to auto-lagged regression (with fixed effects, but no weights, no clustering, no iv) on private data. I will push a comprehensive test soon. Would you accept a test dependency on https://github.com/greimel/GroupedTemporalTerms, or should I use non-grouped lags? |
I added tests, tests pass locally This is ready for review. If desired I can force push three logical commits. I needed this for debugging. |
I realized that I missed one case. The case where the df contains missings and the formula produces missings cannot be handled yet. This because the missings in the df are removed before the the lag is applied. I made the |
I actually need this case for my research. So I am gonna try to handle this case as well. You might want to wait with merging this until then if you wish. I am about to push some commits that make the analogous changes to the |
Thanks! |
Reverted this for the time being out of concern with memory allocation. Sorry for prematurely approving the PR. |
… `lead`/`lag` in `@formula` (FixedEffects#109)"" This reverts commit f68e456.
… `lead`/`lag` in `@formula` (FixedEffects#109)"" This reverts commit f68e456.
… `lead`/`lag` in `@formula` (FixedEffects#109)"" This reverts commit f68e456.
… `lead`/`lag` in `@formula` (FixedEffects#109)"" This reverts commit f68e456. merged
With
lead
/lag
in a@formula
the model matrix contains missings. I adjusted the code so that the output ofmodelmatrix
is checked for missings and and observations with missings are removed.MWE (can be added as test)
This becomes more useful when defining
groupedlag
for panel data, as I did here: JuliaStats/StatsModels.jl#182This is an unsolved issue also for GLM.jl: JuliaStats/GLM.jl#366