Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taking weighting seriously #487

Open
wants to merge 95 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 71 commits
Commits
Show all changes
95 commits
Select commit Hold shift + click to select a range
1754cbd
WIP
gragusa Jun 10, 2022
1d778a5
WIP
gragusa Jun 15, 2022
12121a3
WIP
gragusa Jun 15, 2022
4363ba4
Taking weights seriously
gragusa Jun 17, 2022
ca702dc
WIP
gragusa Jun 18, 2022
e2b2d12
Taking weights seriously
gragusa Jun 21, 2022
bc8709a
Merge branch 'master' of https://github.com/JuliaStats/GLM.jl into Ju…
gragusa Jun 21, 2022
84cd990
Add depwarn for passing wts with Vector
gragusa Jun 22, 2022
cbc329f
Cosmettic changes
gragusa Jun 22, 2022
23d67f5
WIP
gragusa Jun 23, 2022
f4d90a9
Fix loglik for weighted models
gragusa Jul 4, 2022
6b7d95c
Fix remaining issues
gragusa Jul 15, 2022
c236b82
Final commit
gragusa Jul 15, 2022
d4bd0c2
Merge branch 'master'
gragusa Jul 15, 2022
8bdfb55
Fix merge
gragusa Jul 15, 2022
3eb2ca4
Fix nulldeviance
gragusa Jul 16, 2022
63c8358
Bypass crossmodelmatrix drom StatsAPI
gragusa Jul 16, 2022
e93a919
Delete momentmatrix.jl
gragusa Jul 16, 2022
7bb0959
Delete scratch.jl
gragusa Jul 16, 2022
ded17a8
Delete settings.json
gragusa Jul 16, 2022
3346774
AbstractWeights are required to be real
gragusa Sep 5, 2022
7376e78
Update src/glmfit.jl
gragusa Sep 5, 2022
a738268
Apply suggestions from code review
gragusa Sep 5, 2022
c9459e7
Merge pull request #2 from JuliaStats/master
gragusa Sep 5, 2022
6af3ca5
Throw error if GlmResp are not AbastractWeights
gragusa Sep 5, 2022
0ded1d4
Addressing review comments
gragusa Sep 5, 2022
d923e48
Reexport aweights, pweights, fweights
gragusa Sep 5, 2022
84f27d1
Fixed remaining issues with null loglikelihood
gragusa Sep 6, 2022
8804dc1
Fix nullloglikelihood tests
gragusa Sep 6, 2022
7f3aa36
Do not dispatch on Weights but use if
gragusa Sep 6, 2022
f67a8e0
Do not dispatch on Weights use if
gragusa Sep 6, 2022
23a3e87
Fix inferred test
gragusa Sep 6, 2022
5481284
Use if instead of dispatching on Weights
gragusa Sep 6, 2022
d12222e
Add doc for weights and fix output
gragusa Sep 7, 2022
a17e812
Fix docs failures
gragusa Sep 7, 2022
58dec0c
Fix pweights stderror even for rank deficient des
gragusa Sep 7, 2022
a6f5c66
Add test for pweights stderror
gragusa Sep 7, 2022
92ddb1e
Export UnitWeights
gragusa Sep 7, 2022
0c61fff
Fix documentation
gragusa Sep 7, 2022
8b0e8e1
Mkae cooksdistance work with rank deficient design
gragusa Sep 7, 2022
f609f06
Test cooksdistance with rank deficient design
gragusa Sep 7, 2022
23f3d03
Fix CholeskyPivoted signature in docs
gragusa Sep 8, 2022
2749b84
Make nancolidx v1.0 and v1.1 friendly
gragusa Sep 8, 2022
82e472b
Fix signatures
gragusa Sep 9, 2022
2d6aaed
Correct implementation of momentmatrix
gragusa Sep 9, 2022
dbc9ae9
Test moment matrix
gragusa Sep 9, 2022
e0d9cdf
Apply suggestions from code review
gragusa Sep 23, 2022
46e8f92
Incorporate suggestions of reviewer
gragusa Sep 23, 2022
6df401b
Deals with review comments
gragusa Sep 24, 2022
ca15eb8
Small fix
gragusa Sep 24, 2022
0c18ae9
Small fix
gragusa Sep 25, 2022
54d68d1
Apply suggestions from code review
gragusa Oct 3, 2022
422a8cd
Merge branch 'master' into JuliaStats-master
gragusa Oct 3, 2022
d6d4e6b
Fix vcov dispatch for vcov
gragusa Oct 3, 2022
b457d74
Fix dispatch of _vcov
gragusa Oct 3, 2022
b087679
Revert changes
gragusa Oct 3, 2022
a44e137
Update src/glmfit.jl
gragusa Oct 3, 2022
11db2c4
Fix weighted keyword in modelmatrix
gragusa Oct 3, 2022
b649d4f
perf in nulldeviance for unweighted models
gragusa Oct 3, 2022
170148c
Merge branch 'JuliaStats-master' of github.com:gragusa/GLM.jl into Ju…
gragusa Oct 3, 2022
29c43cb
Fixed std error for probability weights
gragusa Oct 19, 2022
279e533
Getting there (& switch Analytics to Importance)
gragusa Oct 20, 2022
afb145e
.= instead of copy!
gragusa Oct 20, 2022
2cead0a
Remove comments
gragusa Oct 20, 2022
a1ec49f
up
gragusa Oct 20, 2022
97bf28d
Speedup cooksdistance
gragusa Oct 23, 2022
9ce2d89
Revert back to AnalyticWeights
gragusa Oct 24, 2022
9bddf63
Add extensive tests for AnalyticWeights
gragusa Oct 24, 2022
3fe045a
Add extensive tests for AnalyticWeights
gragusa Oct 24, 2022
852e307
Delete scratch.jl
gragusa Oct 25, 2022
d1ba3e5
Delete analytic_weights.jl
gragusa Oct 25, 2022
831f280
Follow reviewer suggestions [Batch 1]
gragusa Nov 15, 2022
b00dc16
Follow reviewer's suggestions [Batch 2]
gragusa Nov 15, 2022
0825324
probability weights vcov uses momentmatrix
gragusa Nov 15, 2022
48d15fb
Fix ProbabilityWeights vcov and tests
gragusa Nov 16, 2022
3338eab
Use leverage from StasAPI
gragusa Nov 17, 2022
c27c749
Merge branch 'master' into JuliaStats-master
gragusa Nov 17, 2022
970e26e
Rebase against master
gragusa Nov 17, 2022
8832e9d
Fix test
gragusa Nov 17, 2022
9eb2390
Merge remote-tracking branch 'origin/master' into JuliaStats-master
gragusa Dec 20, 2022
587c129
Test on 1.6
gragusa Dec 20, 2022
fa63a9a
Address reviwer comments
gragusa Dec 29, 2022
807731a
Merge branch 'master' of github.com:JuliaStats/GLM.jl into JuliaStats…
gragusa Jun 16, 2023
72996fc
Merge branch 'master' into JuliaStats-master
andreasnoack Nov 19, 2024
1ee383a
Merge remote-tracking branch 'upstream/master' into JuliaStats-master
gragusa Nov 19, 2024
ba52ce9
Merge from origin
gragusa Nov 19, 2024
5e790df
Fix broken test of dof_residual
gragusa Nov 19, 2024
50c1a96
Fix testing issues
gragusa Nov 19, 2024
c4f7959
Fix docs
gragusa Nov 19, 2024
d2b5cb0
Added tests for ftest. They throw for pweights
gragusa Nov 25, 2024
cd165d7
Make ftest throw if a model weighted by pweights is passed
gragusa Nov 25, 2024
606a419
Fix how loglikelihood throws for pweights weighted models
gragusa Nov 25, 2024
a1a1e10
Merge branch 'master' of github.com:JuliaStats/GLM.jl into JuliaStats…
gragusa Nov 25, 2024
5d948de
Remove StatsPlots dependence.
gragusa Nov 25, 2024
4fb18df
Fix weighting with :qr method.
gragusa Nov 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 13 additions & 11 deletions docs/src/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

```@meta
DocTestSetup = quote
using CategoricalArrays, DataFrames, Distributions, GLM, RDatasets
using CategoricalArrays, DataFrames, Distributions, GLM, RDatasets, StableRNGs
end
```

Expand All @@ -22,33 +22,35 @@ GLM.ModResp

The most general approach to fitting a model is with the `fit` function, as in
```jldoctest
julia> using Random
julia> using GLM, StableRNGs

julia> fit(LinearModel, hcat(ones(10), 1:10), randn(MersenneTwister(12321), 10))
LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}:

julia> fit(LinearModel, hcat(ones(10), 1:10), randn(StableRNG(12321), 10))
Copy link
Contributor

@bkamins bkamins Nov 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are changing this I would recommend not to use randomly generated data in docs. Maybe e.g. use dataset III from Anscombe's quartet? (then users who know it will visually immediately see that the result is correct)

LinearModel{GLM.LmResp{Vector{Float64}, UnitWeights{Int64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}, UnitWeights{Int64}}}:

Coefficients:
────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
────────────────────────────────────────────────────────────────
x1 0.717436 0.775175 0.93 0.3818 -1.07012 2.50499
x2 -0.152062 0.124931 -1.22 0.2582 -0.440153 0.136029
x1 0.361896 0.69896 0.52 0.6186 -1.24991 1.9737
x2 -0.012125 0.112648 -0.11 0.9169 -0.271891 0.247641
────────────────────────────────────────────────────────────────
```

This model can also be fit as
```jldoctest
julia> using Random
julia> using GLM, StableRNGs


julia> lm(hcat(ones(10), 1:10), randn(MersenneTwister(12321), 10))
LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}:
julia> lm(hcat(ones(10), 1:10), randn(StableRNG(12321), 10))
LinearModel{GLM.LmResp{Vector{Float64}, UnitWeights{Int64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}, UnitWeights{Int64}}}:

Coefficients:
────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
────────────────────────────────────────────────────────────────
x1 0.717436 0.775175 0.93 0.3818 -1.07012 2.50499
x2 -0.152062 0.124931 -1.22 0.2582 -0.440153 0.136029
x1 0.361896 0.69896 0.52 0.6186 -1.24991 1.9737
x2 -0.012125 0.112648 -0.11 0.9169 -0.271891 0.247641
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I would add weighted lm putting lower weight to observation 10 in dataset III (an outlier), to show how the results change.

Of course these are soft suggestions, but would show the use of the things that we implement here.

────────────────────────────────────────────────────────────────
```

Expand Down
62 changes: 22 additions & 40 deletions docs/src/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,16 @@ julia> using DataFrames, GLM, StatsBase

julia> data = DataFrame(X=[1,2,3], Y=[2,4,7])
3×2 DataFrame
Row │ X Y
│ Int64 Int64
Row │ X Y
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trailing whitespace probably should be stripped.
Do we have doctests enabled?

│ Int64 Int64
─────┼──────────────
1 │ 1 2
2 │ 2 4
3 │ 3 7

julia> ols = lm(@formula(Y ~ X), data)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}, UnitWeights{Int64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}, UnitWeights{Int64}}}, Matrix{Float64}}

Y ~ 1 + X

Expand Down Expand Up @@ -61,7 +62,7 @@ julia> dof(ols)
3

julia> dof_residual(ols)
1.0
1
nalimilan marked this conversation as resolved.
Show resolved Hide resolved

julia> round(aic(ols); digits=5)
5.84252
Expand Down Expand Up @@ -91,15 +92,15 @@ julia> round.(vcov(ols); digits=5)
```jldoctest
julia> data = DataFrame(X=[1,2,2], Y=[1,0,1])
3×2 DataFrame
Row │ X Y
│ Int64 Int64
Row │ X Y
│ Int64 Int64
─────┼──────────────
1 │ 1 1
2 │ 2 0
3 │ 2 1

julia> probit = glm(@formula(Y ~ X), data, Binomial(), ProbitLink())
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Binomial{Float64}, ProbitLink}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}}}, Matrix{Float64}}
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Binomial{Float64}, ProbitLink, UnitWeights{Int64}}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}, UnitWeights{Int64}}}, Matrix{Float64}}

Y ~ 1 + X

Expand Down Expand Up @@ -140,7 +141,7 @@ julia> quine = dataset("MASS", "quine")
131 rows omitted

julia> nbrmodel = glm(@formula(Days ~ Eth+Sex+Age+Lrn), quine, NegativeBinomial(2.0), LogLink())
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, NegativeBinomial{Float64}, LogLink}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}}}, Matrix{Float64}}
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, NegativeBinomial{Float64}, LogLink, UnitWeights{Int64}}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}, UnitWeights{Int64}}}, Matrix{Float64}}

Days ~ 1 + Eth + Sex + Age + Lrn

Expand All @@ -158,7 +159,7 @@ Lrn: SL 0.296768 0.185934 1.60 0.1105 -0.0676559 0.661191
────────────────────────────────────────────────────────────────────────────

julia> nbrmodel = negbin(@formula(Days ~ Eth+Sex+Age+Lrn), quine, LogLink())
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, NegativeBinomial{Float64}, LogLink}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}}}, Matrix{Float64}}
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, NegativeBinomial{Float64}, LogLink, UnitWeights{Int64}}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}, UnitWeights{Int64}}}, Matrix{Float64}}

Days ~ 1 + Eth + Sex + Age + Lrn

Expand Down Expand Up @@ -196,8 +197,8 @@ julia> using GLM, RDatasets

julia> form = dataset("datasets", "Formaldehyde")
6×2 DataFrame
Row │ Carb OptDen
│ Float64 Float64
Row │ Carb OptDen
│ Float64 Float64
─────┼──────────────────
1 │ 0.1 0.086
2 │ 0.3 0.269
Expand All @@ -207,7 +208,8 @@ julia> form = dataset("datasets", "Formaldehyde")
6 │ 0.9 0.782

julia> lm1 = fit(LinearModel, @formula(OptDen ~ Carb), form)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}, UnitWeights{Int64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}, UnitWeights{Int64}}}, Matrix{Float64}}

OptDen ~ 1 + Carb

Expand Down Expand Up @@ -256,7 +258,8 @@ julia> LifeCycleSavings = dataset("datasets", "LifeCycleSavings")
35 rows omitted

julia> fm2 = fit(LinearModel, @formula(SR ~ Pop15 + Pop75 + DPI + DDPI), LifeCycleSavings)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}, UnitWeights{Int64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}, UnitWeights{Int64}}}, Matrix{Float64}}

SR ~ 1 + Pop15 + Pop75 + DPI + DDPI

Expand Down Expand Up @@ -350,8 +353,8 @@ julia> dobson = DataFrame(Counts = [18.,17,15,20,10,21,25,13,13],
Outcome = categorical([1,2,3,1,2,3,1,2,3]),
Treatment = categorical([1,1,1,2,2,2,3,3,3]))
9×3 DataFrame
Row │ Counts Outcome Treatment
│ Float64 Cat… Cat…
Row │ Counts Outcome Treatment
│ Float64 Cat… Cat…
─────┼─────────────────────────────
1 │ 18.0 1 1
2 │ 17.0 2 1
Expand All @@ -364,7 +367,7 @@ julia> dobson = DataFrame(Counts = [18.,17,15,20,10,21,25,13,13],
9 │ 13.0 3 3

julia> gm1 = fit(GeneralizedLinearModel, @formula(Counts ~ Outcome + Treatment), dobson, Poisson())
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Poisson{Float64}, LogLink}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}}}, Matrix{Float64}}
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Poisson{Float64}, LogLink, UnitWeights{Int64}}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}, UnitWeights{Int64}}}, Matrix{Float64}}

Counts ~ 1 + Outcome + Treatment

Expand All @@ -390,29 +393,8 @@ In this example, we choose the best model from a set of λs, based on minimum BI
```jldoctest
julia> using GLM, RDatasets, StatsBase, DataFrames, Optim

julia> trees = DataFrame(dataset("datasets", "trees"))
31×3 DataFrame
Row │ Girth Height Volume
│ Float64 Int64 Float64
─────┼──────────────────────────
1 │ 8.3 70 10.3
2 │ 8.6 65 10.3
3 │ 8.8 63 10.2
4 │ 10.5 72 16.4
5 │ 10.7 81 18.8
6 │ 10.8 83 19.7
7 │ 11.0 66 15.6
8 │ 11.0 75 18.2
⋮ │ ⋮ ⋮ ⋮
25 │ 16.3 77 42.6
26 │ 17.3 81 55.4
27 │ 17.5 82 55.7
28 │ 17.9 80 58.3
29 │ 18.0 80 51.5
30 │ 18.0 80 51.0
31 │ 20.6 87 77.0
16 rows omitted

julia> trees = DataFrame(dataset("datasets", "trees"));

julia> bic_glm(λ) = bic(glm(@formula(Volume ~ Height + Girth), trees, Normal(), PowerLink(λ)));

julia> optimal_bic = optimize(bic_glm, -1.0, 1.0);
Expand All @@ -421,7 +403,7 @@ julia> round(optimal_bic.minimizer, digits = 5) # Optimal λ
0.40935

julia> glm(@formula(Volume ~ Height + Girth), trees, Normal(), PowerLink(optimal_bic.minimizer)) # Best model
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Normal{Float64}, PowerLink}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}}}, Matrix{Float64}}
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Normal{Float64}, PowerLink, UnitWeights{Int64}}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}, UnitWeights{Int64}}}, Matrix{Float64}}

Volume ~ 1 + Height + Girth

Expand Down
Loading