QR decomposition method in GLM. (#507)

* PowerLink refers to a class of techniques that use a power function * added a few more testcases * Fixing doctests for PowerLink * Making subheading smaller than heading. * Rounding off a test case result to 2 digits Test was failing in Julia Nightly as: 1) GLM.linkinv(InverseLink(), 10) was 0.01 while GLM.linkinv(PowerLink(-1), 10) was 0.010000000000000002 2) GLM.linkinv(InverseSquareLink(), 10) was -10.01 while GLM.linkinv(PowerLink(-2), 10) was -0.010000000000000002 Rounding off to 2 digits should solve this. Note: These tests were passing in other versions of Julia. * adding Optim dependancy to docs * No need to have optim as a dependancy for the entire package. Only need it for the docs * Rounding off GLM.linkinv(PowerLink(-1), 10) and GLM.linkinv(PowerLink(-2), 10 to 2 digits * Rounding off GLM.mueta(PowerLink(-1), 10) to make the test work on Julia Nightly * Using inexact equality comparison instead of rounding. * Apply suggestions from code review Correcting order of PowerLink, ProbitLink, SqrtLink. Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Apply suggestions from code review Using ≈ instead of isapprox. Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Apply suggestions from code review Using alphabetical order in references. Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update docs/src/examples.md Writing the example description in plain text. Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update docs/src/examples.md Removing extra space to be consistent with the style. Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Adding the missing comma. * Update src/glmtools.jl Using a better variable name for 1 by lambda. Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update src/glmtools.jl Using inline function. Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Apply suggestions from code review Making the doctest code concise. Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Follwing alphabetical order in references * Adding suggested changes by nalimilan * Update test/runtests.jl Removing the R code. Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update test/runtests.jl Removing test of hashes. Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * replaced `isapprox` function by `≈` symbol whereever possible, added few more metrics to test, also replaced all `=` by `≈` for real numbers * Update src/glmfit.jl Since we are storing the link - the `GLM.Link` function can be defined uniformly for all link functions Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update src/glmfit.jl Changing the sentence to make it compact Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * updated test cases, added a few more metrics to check, move link member after d (distribution) and changed in corresponding constructor, remove updateμ! function for PowerLink and modified general updateμ! function. * Rephrased the description of PowerLink * Update docs/src/examples.md Adding break line and removing full stop Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Removed `GLM.Link(mm::AbstractGLM) = mm.l`, redefine defination of PowerLink and updated test cases * update test case * Update src/glmtools.jl The suggestion is committed. Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * changed the `inverselink` function for PowerLink as suggested in PR * Add files via upload NIST datasets for testing linear models * added qrpred * use qrpred instead of cholpred * removed tests which use chol * added qr decomp with weights * added dropcollinear for qr decomp * added pivot cholesky for pivot qr * added dof for qr * added chol for qr * updated test cases related to QR method * added float conversion * fixed test * updated QR decompositions * removed redundant functions * updating documention * fixed bug * refactored DensePredQR constructor * changed DensePredQR constructor to take Abstract types * removed unused functions * removed unused functions * updated doc string * conditional pivoting in QR * conditional QR pivoting * updated example of Lineam model with QR decomposotion * removed nasty.csv file and commented code * Update src/GLM.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update src/lm.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * some changes suggested by @nalimilan * Changed pivoted_qr to pivoted_qr! * Changed :stable to :qr and :fast to cholesky * Changed in predict function in lm for qr method, also added some testcases * re arranged all tests related to linear models (Cholesky and QR * An intermidiate commit after merging with master branch * without predict with qr test case. need to re-write predict function in lm * updated example.md file * Added test cases with predictions for QR method * Removed some comments * removed comments, replace Rsub\I by inv(Rsub) * Update test/runtests.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update test/runtests.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update test/runtests.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update test/runtests.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update src/GLM.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update src/lm.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update src/lm.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update src/lm.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update src/lm.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update test/runtests.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * updated test cases, qrpred method * removed extra lines in lm file * updated predict! function in lm in compact form * Optimizing the performnce for full rank matrix * changed outer constructor to inner * Incorporated QR decomposition method in glm related functions * NegativeBinomial Parameter estimation test is failing in a few system. Trying a different minstepfac value * NegativeBinomial Parameter estimation test is failing in a few system. Trying another smaller different minstepfac value * NegativeBinomial Parameter estimation test is failing in a few systemse * added one more example to show when QR method works better * update for doc test failure * Fixing doc test * Fixing doc test failure * Fixing doc test failure * Fixing for doc test failure * updated example for :qr method * Updated example for QR decomposition * Updated example for QR decomposition * Updated example for QR decomposition * Updated instead of generic error * Trying to resolve the conflict * Updated example to show QR performs better * Update src/GLM.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Update src/linpred.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Updated test cases. Test cases with cholesky and qr methods are written in compact form * Updated test case due to different pivot in different systems * replaced equality check to egality check in glmfit * updated example and added doctest * updated example without doctest * Avoiding multiple copyies of design matrix * added one more example in qr vs cholesky * updated example, remove unused constructor and added one more test case * updated example to pass doctest * updated example to pass doctest * updated example to pass doctest * updated example to pass doctest * example * updated example with rdchem data * Update docs/src/examples.md Co-authored-by: Bogumił Kamiński <bkamins@sgh.waw.pl> --------- Co-authored-by: Mousum <mousum@spotle.ai> Co-authored-by: ayushpatnaikgit <ayushpatnaik@gmail.com> Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> Co-authored-by: Ayush Patnaik <u6012645@anu.edu.au> Co-authored-by: harsharora21 <harsharora1337@gmail.com> Co-authored-by: Bogumił Kamiński <bkamins@sgh.waw.pl>
JuliaStats · May 4, 2023 · a8016bd · a8016bd
1 parent 9f5276e
commit a8016bd
Show file tree

Hide file tree

Showing 7 changed files with 679 additions and 237 deletions.
diff --git a/docs/src/examples.md b/docs/src/examples.md
@@ -86,6 +86,129 @@ julia> round.(vcov(ols); digits=5)
   0.38889  -0.16667
  -0.16667   0.08333
 ```
+By default, the `lm` method uses the Cholesky factorization which is known as fast but numerically unstable, especially for ill-conditioned design matrices. Also, the Cholesky method aggressively detects multicollinearity. You can use the `method` keyword argument to apply a more stable QR factorization method.
+
+```jldoctest
+julia> data = DataFrame(X=[1,2,3], Y=[2,4,7]);
+
+julia> ols = lm(@formula(Y ~ X), data; method=:qr)
+LinearModel
+
+Y ~ 1 + X
+
+Coefficients:
+─────────────────────────────────────────────────────────────────────────
+                 Coef.  Std. Error      t  Pr(>|t|)  Lower 95%  Upper 95%
+─────────────────────────────────────────────────────────────────────────
+(Intercept)  -0.666667    0.62361   -1.07    0.4788   -8.59038    7.25704
+X             2.5         0.288675   8.66    0.0732   -1.16797    6.16797
+─────────────────────────────────────────────────────────────────────────
+```
+The following example shows that QR decomposition works better for an ill-conditioned design matrix. The linear model with the QR method is a better model than the linear model with Cholesky decomposition method since the estimated loglikelihood of previous model is higher.
+Note that, the condition number of the design matrix is quite high (≈ 3.52e7).
+
+```
+julia> X = [-0.4011512997627107 0.6368622664511552;
+            -0.0808472925693535 0.12835204623364604;
+            -0.16931095045225217 0.2687956795496601;
+            -0.4110745650568839 0.6526163576003452;
+            -0.4035951747670475 0.6407421349445884;
+            -0.4649907741370211 0.7382129928076485;
+            -0.15772708898883683 0.25040532268222715;
+            -0.38144358562952446 0.6055745630707645;
+            -0.1012787681395544 0.16078875117643368;
+            -0.2741403589052255 0.4352214984054432];
+
+julia> y = [4.362866166172215,
+            0.8792840060172619,
+            1.8414020451091684,
+            4.470790758717395,
+            4.3894454833815395,
+            5.0571760643993455,
+            1.7154177874916376,
+            4.148527704012107,
+            1.1014936742570425,
+            2.9815131910316097];
+
+julia> modelqr = lm(X, y; method=:qr)
+LinearModel
+
+Coefficients:
+────────────────────────────────────────────────────────────────
+       Coef.  Std. Error       t  Pr(>|t|)  Lower 95%  Upper 95%
+────────────────────────────────────────────────────────────────
+x1   5.00389   0.0560164   89.33    <1e-12    4.87472    5.13307
+x2  10.0025    0.035284   283.48    <1e-16    9.92109   10.0838
+────────────────────────────────────────────────────────────────
+
+julia> modelchol = lm(X, y; method=:cholesky)
+LinearModel
+
+Coefficients:
+────────────────────────────────────────────────────────────────
+       Coef.  Std. Error       t  Pr(>|t|)  Lower 95%  Upper 95%
+────────────────────────────────────────────────────────────────
+x1   5.17647   0.0849184   60.96    <1e-11    4.98065    5.37229
+x2  10.1112    0.053489   189.03    <1e-15    9.98781   10.2345
+────────────────────────────────────────────────────────────────
+
+julia> loglikelihood(modelqr) > loglikelihood(modelchol)
+true
+```
+Since the Cholesky method with `dropcollinear = true` aggressively detects multicollinearity,
+if you ever encounter multicollinearity in any GLM model with Cholesky,
+it is worth trying the same model with QR decomposition.
+The following example is taken from `Introductory Econometrics: A Modern Approach, 7e" by Jeffrey M. Wooldridge`.
+The dataset is used to study the relationship between firm size—often measured by annual sales—and spending on
+research and development (R&D).
+The following shows that for the given model,
+the Cholesky method detects multicollinearity in the design matrix with `dropcollinear=true`
+and hence does not estimate all parameters as opposed to QR.
+
+```jldoctest
+julia> y = [9.42190647125244, 2.084805727005, 3.9376676082611, 2.61976027488708, 4.04761934280395, 2.15384602546691,
+            2.66240668296813, 4.39475727081298, 5.74520826339721, 3.59616208076477, 1.54265284538269, 2.59368276596069,
+            1.80476510524749, 1.69270837306976, 3.04201245307922, 2.18389105796813, 2.73844122886657, 2.88134002685546,
+            2.46666669845581, 3.80616021156311, 5.12149810791015, 6.80378007888793, 3.73669862747192, 1.21332454681396,
+            2.54629635810852, 5.1612901687622, 1.86798071861267, 1.21465551853179, 6.31019830703735, 1.02669405937194, 
+            2.50623273849487, 1.5936255455017];
+
+julia> x = [4570.2001953125, 2830, 596.799987792968, 133.600006103515, 42, 390, 93.9000015258789, 907.900024414062,
+            19773, 39709, 2936.5, 2513.80004882812, 1124.80004882812, 921.599975585937, 2432.60009765625, 6754,
+            1066.30004882812, 3199.89990234375, 150, 509.700012207031, 1452.69995117187, 8995, 1212.30004882812,
+            906.599975585937, 2592, 201.5, 2617.80004882812, 502.200012207031, 2824, 292.200012207031, 7621, 1631.5];
+
+julia> rdchem = DataFrame(rdintens=y, sales=x);
+
+julia> mdl = lm(@formula(rdintens ~ sales + sales^2), rdchem; method=:cholesky)
+LinearModel
+
+rdintens ~ 1 + sales + :(sales ^ 2)
+
+Coefficients:
+───────────────────────────────────────────────────────────────────────────────────────
+                    Coef.     Std. Error       t  Pr(>|t|)      Lower 95%     Upper 95%
+───────────────────────────────────────────────────────────────────────────────────────
+(Intercept)   0.0          NaN            NaN       NaN     NaN            NaN
+sales         0.000852509    0.000156784    5.44    <1e-05    0.000532313    0.00117271
+sales ^ 2    -1.97385e-8     4.56287e-9    -4.33    0.0002   -2.90571e-8    -1.04199e-8
+───────────────────────────────────────────────────────────────────────────────────────
+
+julia> mdl = lm(@formula(rdintens ~ sales + sales^2), rdchem; method=:qr)
+LinearModel
+
+rdintens ~ 1 + sales + :(sales ^ 2)
+
+Coefficients:
+─────────────────────────────────────────────────────────────────────────────────
+                    Coef.   Std. Error      t  Pr(>|t|)    Lower 95%    Upper 95%
+─────────────────────────────────────────────────────────────────────────────────
+(Intercept)   2.61251      0.429442      6.08    <1e-05   1.73421     3.49082
+sales         0.000300571  0.000139295   2.16    0.0394   1.56805e-5  0.000585462
+sales ^ 2    -6.94594e-9   3.72614e-9   -1.86    0.0725  -1.45667e-8  6.7487e-10
+─────────────────────────────────────────────────────────────────────────────────
+```
+
 
 ## Probit regression
 ```jldoctest

diff --git a/src/GLM.jl b/src/GLM.jl
@@ -91,13 +91,23 @@ module GLM
         pivoted_cholesky!(A; kwargs...) = cholesky!(A, RowMaximum(); kwargs...)
     end
 
+    @static if VERSION < v"1.7.0"
+        pivoted_qr!(A; kwargs...) = qr!(A, Val(true); kwargs...)
+    else
+        pivoted_qr!(A; kwargs...) = qr!(A, ColumnNorm(); kwargs...)
+    end
+
     const COMMON_FIT_KWARGS_DOCS = """
         - `dropcollinear::Bool=true`: Controls whether or not a model matrix
           less-than-full rank is accepted.
           If `true` (the default) the coefficient for redundant linearly dependent columns is
           `0.0` and all associated statistics are set to `NaN`.
           Typically from a set of linearly-dependent columns the last ones are identified as redundant
           (however, the exact selection of columns identified as redundant is not guaranteed).
+        - `method::Symbol=:cholesky`: Controls which decomposition method to use.
+          If `method=:cholesky` (the default), then the `Cholesky` decomposition method will be used.
+          If `method=:qr`, then the `QR` decomposition method (which is more stable
+          but slower) will be used.
         - `wts::Vector=similar(y,0)`: Prior frequency (a.k.a. case) weights of observations.
           Such weights are equivalent to repeating each observation a number of times equal
           to its weight. Do note that this interpretation gives equal point estimates but

diff --git a/src/glmfit.jl b/src/glmfit.jl
@@ -286,7 +286,8 @@ function nulldeviance(m::GeneralizedLinearModel)
         X = fill(1.0, length(y), hasint ? 1 : 0)
         nullm = fit(GeneralizedLinearModel,
                     X, y, d, r.link; wts=wts, offset=offset,
-                    dropcollinear=isa(m.pp.chol, CholeskyPivoted),
+                    dropcollinear=ispivoted(m.pp),
+                    method=decomposition_method(m.pp),
                     maxiter=m.maxiter, minstepfac=m.minstepfac,
                     atol=m.atol, rtol=m.rtol)
         dev = deviance(nullm)
@@ -341,7 +342,8 @@ function nullloglikelihood(m::GeneralizedLinearModel)
         X = fill(1.0, length(y), hasint ? 1 : 0)
         nullm = fit(GeneralizedLinearModel,
                     X, y, d, r.link; wts=wts, offset=offset,
-                    dropcollinear=isa(m.pp.chol, CholeskyPivoted),
+                    dropcollinear=ispivoted(m.pp),
+                    method=decomposition_method(m.pp),
                     maxiter=m.maxiter, minstepfac=m.minstepfac,
                     atol=m.atol, rtol=m.rtol)
         ll = loglikelihood(nullm)
@@ -559,6 +561,7 @@ function fit(::Type{M},
     d::UnivariateDistribution,
     l::Link = canonicallink(d);
     dropcollinear::Bool = true,
+    method::Symbol = :cholesky,
     dofit::Union{Bool, Nothing} = nothing,
     wts::AbstractVector{<:Real}      = similar(y, 0),
     offset::AbstractVector{<:Real}   = similar(y, 0),
@@ -575,7 +578,15 @@ function fit(::Type{M},
     end
 
     rr = GlmResp(y, d, l, offset, wts)
-    res = M(rr, cholpred(X, dropcollinear), nothing, false)
+
+    if method === :cholesky
+        res = M(rr, cholpred(X, dropcollinear), nothing, false)
+    elseif method === :qr
+        res = M(rr, qrpred(X, dropcollinear), nothing, false)
+    else
+        throw(ArgumentError("The only supported values for keyword argument `method` are `:cholesky` and `:qr`."))
+    end
+
     return dofit ? fit!(res; fitargs...) : res
 end
 
@@ -594,6 +605,7 @@ function fit(::Type{M},
              offset::Union{AbstractVector, Nothing} = nothing,
              wts::Union{AbstractVector, Nothing} = nothing,
              dropcollinear::Bool = true,
+             method::Symbol = :cholesky,
              dofit::Union{Bool, Nothing} = nothing,
              contrasts::AbstractDict{Symbol}=Dict{Symbol,Any}(),
              fitargs...) where {M<:AbstractGLM}
@@ -613,7 +625,15 @@ function fit(::Type{M},
     off = offset === nothing ? similar(y, 0) : offset
     wts = wts === nothing ? similar(y, 0) : wts
     rr = GlmResp(y, d, l, off, wts)
-    res = M(rr, cholpred(X, dropcollinear), f, false)
+
+    if method === :cholesky
+        res = M(rr, cholpred(X, dropcollinear), f, false)
+    elseif method === :qr
+        res = M(rr, qrpred(X, dropcollinear), f, false)
+    else
+        throw(ArgumentError("The only supported values for keyword argument `method` are `:cholesky` and `:qr`."))
+    end
+
     return dofit ? fit!(res; fitargs...) : res
 end