Add keyword arg to modelmatrix; define momentmatrix #16

gragusa · 2022-06-15T17:18:28Z

The modelmatrix has now a keyword weighted=false which is useful for dealing with weighted models.
Add momentmatrix - this function is intended to return the matrix of estimating equations; for instance, for a linear model should return u*X, where u is the vector of residuals and X is the model matrix.

src/regressionmodel.jl

src/statisticalmodel.jl

src/regressionmodel.jl

src/statisticalmodel.jl

test/regressionmodel.jl

nalimilan · 2022-06-16T20:50:43Z

test/regressionmodel.jl

@@ -6,13 +6,32 @@ using StatsAPI: RegressionModel, crossmodelmatrix
 struct MyRegressionModel <: RegressionModel
 end

+struct ItsRegressionModel <: RegressionModel
+    wts


Suggested change

wts

wts::AbstractVector

src/statisticalmodel.jl

test/statisticalmodel.jl

src/regressionmodel.jl

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

codecov-commenter · 2022-06-16T22:42:53Z

Codecov Report

Attention: Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 97.43%. Comparing base (20b38e1) to head (93f8742).
Report is 9 commits behind head on main.

Files with missing lines	Patch %	Lines
src/regressionmodel.jl	75.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##              main      #16      +/-   ##
===========================================
- Coverage   100.00%   97.43%   -2.57%     
===========================================
  Files            3        2       -1     
  Lines           37       39       +2     
===========================================
+ Hits            37       38       +1     
- Misses           0        1       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

nalimilan · 2022-06-17T07:35:37Z

src/regressionmodel.jl


-Return the model matrix (a.k.a. the design matrix).
+Return the model matrix (a.k.a. the design matrix) or, if `weighted=true` the weighted 
+model matrix, i.e. `X' * sqrt.(W)`, where `X` is the model matrix and


Why transpose X? It sounds weird to change the orientation of the result depending on whether it's weighted or not.

My bad...I will fix it

src/regressionmodel.jl

mschauer · 2022-06-17T07:46:07Z

src/regressionmodel.jl


-Return the model matrix (a.k.a. the design matrix).
+Return the model matrix (a.k.a. the design matrix) or, if `weighted=true` the weighted 


Suggested change

Return the model matrix (a.k.a. the design matrix) or, if `weighted=true` the weighted

Return the model matrix (design matrix) or, if `weighted=true` the weighted

mschauer · 2022-06-17T07:46:38Z

src/regressionmodel.jl


-Return `X'X` where `X` is the model matrix of `model`.
+Return `X'X` where `X` is the model matrix of `model` or, if `weighted=true`, `X'WX`, 
+where `W` is the diagonal matrix whose elements are the model weights. 


Can we define weights?

We could add a link to the weights(::StatisticalModel) method. Indeed there can be confusion between prior weights and working weights (though these terms can also confuse casual users).

How exactly do I add a link to weights(::StatisticalModel)? Is there a way to link docs from different packages?

It's in the same package so I think something like [model weights](@ref weights(::StatisticalModel)) should work. Better test it though by building the StatsBase docs (julia docs/make.jl) using the updated StatsAPI.

src/statisticalmodel.jl

Co-authored-by: Moritz Schauer <moritzschauer@web.de>

gragusa · 2022-06-21T14:55:00Z

The only thing that we probably should do is to allow for modelmatrix to take another keyword argument, e.g., droppcollinearcols, to return only the columns corresponding to non-NaN coefficients in a Pivot Cholesky.

We could do it next (after I drop a bomb-PR against GLM. The GLM PR is waiting for this PR to get merged)

nalimilan · 2022-06-22T19:35:52Z

Let's tackle this separately. :-)

I'd rather review the GLM PR before merging this one, usually having the implementation is a good way to check that the API is the right one.

gragusa · 2022-09-10T15:46:58Z

I think it would be helpful to think about the API for dealing with rank-deficient models. For instance,
‘modelmatrix’ returns all the columns even those related to coefficients that cannot be estimated. I think this is fine as a default, but it would be useful to have a keyword argument to return only the columns corresponding to the estimable coefficients and a mechanism to identify the indexes of these columns. I have hacked these second part in this PR.

src/regressionmodel.jl

nalimilan · 2022-10-18T20:50:25Z

src/regressionmodel.jl

+    residuals(model::RegressionModel; weighted::Bool=false)
+
+Return the residuals of the model or, if `weighted=true`, the residuals multiplied by
+the square root of the [model weights](@ref weights(::StatisticalModel)). 


Where does the square root come from exactly? Doesn't that assume a particular definition of residuals (i.e. using L2-norm rather than e.g. L1-norm)?

Well, this is tricky. Also modelmatrix multiplies the entries of X by the square-root of the weights. Why?

Think about the linear model. With weights, the crossmodel matrix is $X'WX$. Then, to obtain it we can now do modelmatrix(lm1; weighted = true)'modelmatrix(lm1; weighted = true).

Notice that this is consistent with R; see, e.g., the function weighted.residuals which is in stats.

With weights, any weights is
$$\sqrt{w_i}y_i = \sqrt{w_i}x_i \beta + \sqrt{w_i}u_i.$$ So the understanding is that weighting single constituents of the model (y,x,u) amount to weight by $\sqrt{w_i}$.

Yeah, another tricky point. My understanding is that for residuals, the square root comes from the fact that deviance residuals themselves are defined as the square root of quantities which are partitions of the deviance. Right?

Note the R docstring for weighted.residuals says "Weighted residuals are based on the deviance residuals", which are only one kind of residual. Actually in R residuals also returns weighted residuals, except for response residuals, which are always unweighted. Maybe to be completely accurate we could say "for deviance and Pearson residuals...", so that packages are free to use different definitions (or throw an error) if needed?

I think what @nalimilan says is that the assumption here (and in your change of modelmatrix) is that for all kinds of weights the weighted model matrix is X * sqrt.(W). Is it always true for FrequencyWeights, AnalyticWeights and ProbabilityWeights? x-ref: JuliaStats/GLM.jl#487

@bkamins weighted residuals, weighted model matrix do not exist in statistics. They are only useful from a coding point of view - they make it easier to write neater code.

I have always defined these quantities as multiplied by $\sqrt{w_i}$ as it is much more convenient. Some thing for R — which returns silently squared-root weighted residuals. Also other packages, notable FixedEffectModels.jl does that.

@nalimilan make sense what you propose - I will add more context to the doc

Even if these don't exist in statistics, the question can be phrased as "are there situations where the returned value is useful, even when you don't know the kind of weights used". I think the answer is yes, but it's tricky, so... R base only supports analytic weights so it's not a great reference.

src/statisticalmodel.jl

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

gragusa · 2022-11-17T21:54:55Z

Something that would be very useful and that I would like to add o this PR is invloglikhessian - inverse of the hessian of the likelihood. The name is not great, but invloglikelihoodhessian is a monster, and I don't like invloglikhess (but I would do anything to have this merged.

With this method defined (whose implementation for GLM is part of JuliaStats/GLM.jl#487) CovarianceMatrices.jl could drop the GLM dependency, and implementation of all the covariances could be done via the API.

nalimilan · 2022-11-27T15:22:25Z

Sure. Would it make sense to call it invhessian? Do other modeling packages define this function currently?

gragusa · 2022-12-20T17:04:09Z

Sure. Would it make sense to call it invhessian? Do other modeling packages define this function currently?

I don't think so. The R package dealing with robust variances uses meat and bread which is not very clear (but in that case does not matter since R is not composable and methods can be extended only by the package itself).

We already have the inverse of the log hessian of the likelihood in glm -- is invchol. For linear models, the inverse of the normal log-likelihood is (X'X)^{-1}. But for general likelihoods this is not the case.

Now, invhessian is a little too generic, but I could live with it.

nalimilan · 2022-12-24T18:07:48Z

I don't have a strong preference, but at least for consistency I think we should spell "likelihood" in full if we use that term. Luckily autocompletion will almost always work. ;-)

gragusa added 2 commits June 10, 2022 20:53

WIP

e04ad3f

Fix doc

6f6a160

nalimilan reviewed Jun 16, 2022

View reviewed changes

src/regressionmodel.jl Outdated Show resolved Hide resolved

src/statisticalmodel.jl Outdated Show resolved Hide resolved

WIP

0625c3c

nalimilan reviewed Jun 16, 2022

View reviewed changes

gragusa and others added 7 commits June 17, 2022 00:34

Update src/regressionmodel.jl

e514db0

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

Update src/regressionmodel.jl

bfe8ac6

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

Update src/statisticalmodel.jl

af65888

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

Change type name for clarity

11505ff

Update src/regressionmodel.jl

de0c6ae

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

Don't test momentmatrix

c36352b

Merge branch 'main' of https://github.com/gragusa/StatsAPI.jl

14ccc70

Don't test momentmatrix

9a7b2ab

nalimilan reviewed Jun 17, 2022

View reviewed changes

mschauer reviewed Jun 17, 2022

View reviewed changes

src/regressionmodel.jl Outdated Show resolved Hide resolved

mschauer reviewed Jun 17, 2022

View reviewed changes

src/statisticalmodel.jl Outdated Show resolved Hide resolved

gragusa and others added 3 commits June 17, 2022 14:47

Update src/regressionmodel.jl

48333e5

Co-authored-by: Moritz Schauer <moritzschauer@web.de>

Docs fixes

e927f72

Improve docs

ab928fa

gragusa mentioned this pull request Jun 21, 2022

Taking Weights Seriously JuliaStats/GLM.jl#485

Closed

gragusa mentioned this pull request Jul 15, 2022

Taking weighting seriously JuliaStats/GLM.jl#487

Open

3 tasks

nalimilan reviewed Oct 18, 2022

View reviewed changes

gragusa and others added 2 commits October 19, 2022 11:16

Update src/regressionmodel.jl to reflect reviewer suggestions

2858ba0

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

Update src/statisticalmodel.jl

06206ad

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

gragusa added 2 commits November 17, 2022 09:47

Merge branch 'JuliaStats:main' into main

7fa73cb

Merge branch 'main' of https://github.com/gragusa/StatsAPI.jl

c3460ae

gragusa changed the title ~~Add keyword arg to modelmatrix; define momentfunction~~ Add keyword arg to modelmatrix; define momentmatrix Nov 17, 2022

Merge branch 'JuliaStats:main' into main

93f8742

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add keyword arg to modelmatrix; define momentmatrix #16

Add keyword arg to modelmatrix; define momentmatrix #16

gragusa commented Jun 15, 2022

nalimilan Jun 16, 2022

codecov-commenter commented Jun 16, 2022 •

edited

Loading

nalimilan Jun 17, 2022

gragusa Jun 17, 2022

mschauer Jun 17, 2022

mschauer Jun 17, 2022

nalimilan Jun 17, 2022

gragusa Jun 17, 2022

nalimilan Jun 17, 2022

gragusa commented Jun 21, 2022 •

edited

Loading

nalimilan commented Jun 22, 2022

gragusa commented Sep 10, 2022

nalimilan Oct 18, 2022

gragusa Oct 19, 2022

nalimilan Oct 19, 2022

bkamins Oct 19, 2022

gragusa Oct 19, 2022

nalimilan Oct 19, 2022

gragusa commented Nov 17, 2022

nalimilan commented Nov 27, 2022

gragusa commented Dec 20, 2022

nalimilan commented Dec 24, 2022


		Return the model matrix (a.k.a. the design matrix).
		Return the model matrix (a.k.a. the design matrix) or, if `weighted=true` the weighted

	Return the model matrix (a.k.a. the design matrix) or, if `weighted=true` the weighted
	Return the model matrix (design matrix) or, if `weighted=true` the weighted

Add keyword arg to modelmatrix; define momentmatrix #16

Are you sure you want to change the base?

Add keyword arg to modelmatrix; define momentmatrix #16

Conversation

gragusa commented Jun 15, 2022

Choose a reason for hiding this comment

codecov-commenter commented Jun 16, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gragusa commented Jun 21, 2022 • edited Loading

nalimilan commented Jun 22, 2022

gragusa commented Sep 10, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gragusa commented Nov 17, 2022

nalimilan commented Nov 27, 2022

gragusa commented Dec 20, 2022

nalimilan commented Dec 24, 2022

codecov-commenter commented Jun 16, 2022 •

edited

Loading

gragusa commented Jun 21, 2022 •

edited

Loading