-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Taking weighting seriously #487
base: master
Are you sure you want to change the base?
Changes from 71 commits
1754cbd
1d778a5
12121a3
4363ba4
ca702dc
e2b2d12
bc8709a
84cd990
cbc329f
23d67f5
f4d90a9
6b7d95c
c236b82
d4bd0c2
8bdfb55
3eb2ca4
63c8358
e93a919
7bb0959
ded17a8
3346774
7376e78
a738268
c9459e7
6af3ca5
0ded1d4
d923e48
84f27d1
8804dc1
7f3aa36
f67a8e0
23a3e87
5481284
d12222e
a17e812
58dec0c
a6f5c66
92ddb1e
0c61fff
8b0e8e1
f609f06
23f3d03
2749b84
82e472b
2d6aaed
dbc9ae9
e0d9cdf
46e8f92
6df401b
ca15eb8
0c18ae9
54d68d1
422a8cd
d6d4e6b
b457d74
b087679
a44e137
11db2c4
b649d4f
170148c
29c43cb
279e533
afb145e
2cead0a
a1ec49f
97bf28d
9ce2d89
9bddf63
3fe045a
852e307
d1ba3e5
831f280
b00dc16
0825324
48d15fb
3338eab
c27c749
970e26e
8832e9d
9eb2390
587c129
fa63a9a
807731a
72996fc
1ee383a
ba52ce9
5e790df
50c1a96
c4f7959
d2b5cb0
cd165d7
606a419
a1a1e10
5d948de
4fb18df
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,7 +2,7 @@ | |
|
||
```@meta | ||
DocTestSetup = quote | ||
using CategoricalArrays, DataFrames, Distributions, GLM, RDatasets | ||
using CategoricalArrays, DataFrames, Distributions, GLM, RDatasets, StableRNGs | ||
end | ||
``` | ||
|
||
|
@@ -22,33 +22,35 @@ GLM.ModResp | |
|
||
The most general approach to fitting a model is with the `fit` function, as in | ||
```jldoctest | ||
julia> using Random | ||
julia> using GLM, StableRNGs | ||
|
||
julia> fit(LinearModel, hcat(ones(10), 1:10), randn(MersenneTwister(12321), 10)) | ||
LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}: | ||
|
||
julia> fit(LinearModel, hcat(ones(10), 1:10), randn(StableRNG(12321), 10)) | ||
LinearModel{GLM.LmResp{Vector{Float64}, UnitWeights{Int64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}, UnitWeights{Int64}}}: | ||
|
||
Coefficients: | ||
──────────────────────────────────────────────────────────────── | ||
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95% | ||
──────────────────────────────────────────────────────────────── | ||
x1 0.717436 0.775175 0.93 0.3818 -1.07012 2.50499 | ||
x2 -0.152062 0.124931 -1.22 0.2582 -0.440153 0.136029 | ||
x1 0.361896 0.69896 0.52 0.6186 -1.24991 1.9737 | ||
x2 -0.012125 0.112648 -0.11 0.9169 -0.271891 0.247641 | ||
──────────────────────────────────────────────────────────────── | ||
``` | ||
|
||
This model can also be fit as | ||
```jldoctest | ||
julia> using Random | ||
julia> using GLM, StableRNGs | ||
|
||
|
||
julia> lm(hcat(ones(10), 1:10), randn(MersenneTwister(12321), 10)) | ||
LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}: | ||
julia> lm(hcat(ones(10), 1:10), randn(StableRNG(12321), 10)) | ||
LinearModel{GLM.LmResp{Vector{Float64}, UnitWeights{Int64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}, UnitWeights{Int64}}}: | ||
|
||
Coefficients: | ||
──────────────────────────────────────────────────────────────── | ||
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95% | ||
──────────────────────────────────────────────────────────────── | ||
x1 0.717436 0.775175 0.93 0.3818 -1.07012 2.50499 | ||
x2 -0.152062 0.124931 -1.22 0.2582 -0.440153 0.136029 | ||
x1 0.361896 0.69896 0.52 0.6186 -1.24991 1.9737 | ||
x2 -0.012125 0.112648 -0.11 0.9169 -0.271891 0.247641 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Then I would add weighted Of course these are soft suggestions, but would show the use of the things that we implement here. |
||
──────────────────────────────────────────────────────────────── | ||
``` | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,15 +12,16 @@ julia> using DataFrames, GLM, StatsBase | |
|
||
julia> data = DataFrame(X=[1,2,3], Y=[2,4,7]) | ||
3×2 DataFrame | ||
Row │ X Y | ||
│ Int64 Int64 | ||
Row │ X Y | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. trailing whitespace probably should be stripped. |
||
│ Int64 Int64 | ||
─────┼────────────── | ||
1 │ 1 2 | ||
2 │ 2 4 | ||
3 │ 3 7 | ||
|
||
julia> ols = lm(@formula(Y ~ X), data) | ||
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}} | ||
|
||
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}, UnitWeights{Int64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}, UnitWeights{Int64}}}, Matrix{Float64}} | ||
|
||
Y ~ 1 + X | ||
|
||
|
@@ -61,7 +62,7 @@ julia> dof(ols) | |
3 | ||
|
||
julia> dof_residual(ols) | ||
1.0 | ||
1 | ||
nalimilan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
julia> round(aic(ols); digits=5) | ||
5.84252 | ||
|
@@ -91,15 +92,15 @@ julia> round.(vcov(ols); digits=5) | |
```jldoctest | ||
julia> data = DataFrame(X=[1,2,2], Y=[1,0,1]) | ||
3×2 DataFrame | ||
Row │ X Y | ||
│ Int64 Int64 | ||
Row │ X Y | ||
│ Int64 Int64 | ||
─────┼────────────── | ||
1 │ 1 1 | ||
2 │ 2 0 | ||
3 │ 2 1 | ||
|
||
julia> probit = glm(@formula(Y ~ X), data, Binomial(), ProbitLink()) | ||
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Binomial{Float64}, ProbitLink}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}}}, Matrix{Float64}} | ||
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Binomial{Float64}, ProbitLink, UnitWeights{Int64}}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}, UnitWeights{Int64}}}, Matrix{Float64}} | ||
|
||
Y ~ 1 + X | ||
|
||
|
@@ -140,7 +141,7 @@ julia> quine = dataset("MASS", "quine") | |
131 rows omitted | ||
|
||
julia> nbrmodel = glm(@formula(Days ~ Eth+Sex+Age+Lrn), quine, NegativeBinomial(2.0), LogLink()) | ||
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, NegativeBinomial{Float64}, LogLink}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}}}, Matrix{Float64}} | ||
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, NegativeBinomial{Float64}, LogLink, UnitWeights{Int64}}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}, UnitWeights{Int64}}}, Matrix{Float64}} | ||
|
||
Days ~ 1 + Eth + Sex + Age + Lrn | ||
|
||
|
@@ -158,7 +159,7 @@ Lrn: SL 0.296768 0.185934 1.60 0.1105 -0.0676559 0.661191 | |
──────────────────────────────────────────────────────────────────────────── | ||
|
||
julia> nbrmodel = negbin(@formula(Days ~ Eth+Sex+Age+Lrn), quine, LogLink()) | ||
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, NegativeBinomial{Float64}, LogLink}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}}}, Matrix{Float64}} | ||
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, NegativeBinomial{Float64}, LogLink, UnitWeights{Int64}}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}, UnitWeights{Int64}}}, Matrix{Float64}} | ||
|
||
Days ~ 1 + Eth + Sex + Age + Lrn | ||
|
||
|
@@ -196,8 +197,8 @@ julia> using GLM, RDatasets | |
|
||
julia> form = dataset("datasets", "Formaldehyde") | ||
6×2 DataFrame | ||
Row │ Carb OptDen | ||
│ Float64 Float64 | ||
Row │ Carb OptDen | ||
│ Float64 Float64 | ||
─────┼────────────────── | ||
1 │ 0.1 0.086 | ||
2 │ 0.3 0.269 | ||
|
@@ -207,7 +208,8 @@ julia> form = dataset("datasets", "Formaldehyde") | |
6 │ 0.9 0.782 | ||
|
||
julia> lm1 = fit(LinearModel, @formula(OptDen ~ Carb), form) | ||
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}} | ||
|
||
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}, UnitWeights{Int64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}, UnitWeights{Int64}}}, Matrix{Float64}} | ||
|
||
OptDen ~ 1 + Carb | ||
|
||
|
@@ -256,7 +258,8 @@ julia> LifeCycleSavings = dataset("datasets", "LifeCycleSavings") | |
35 rows omitted | ||
|
||
julia> fm2 = fit(LinearModel, @formula(SR ~ Pop15 + Pop75 + DPI + DDPI), LifeCycleSavings) | ||
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}} | ||
|
||
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}, UnitWeights{Int64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}, UnitWeights{Int64}}}, Matrix{Float64}} | ||
|
||
SR ~ 1 + Pop15 + Pop75 + DPI + DDPI | ||
|
||
|
@@ -350,8 +353,8 @@ julia> dobson = DataFrame(Counts = [18.,17,15,20,10,21,25,13,13], | |
Outcome = categorical([1,2,3,1,2,3,1,2,3]), | ||
Treatment = categorical([1,1,1,2,2,2,3,3,3])) | ||
9×3 DataFrame | ||
Row │ Counts Outcome Treatment | ||
│ Float64 Cat… Cat… | ||
Row │ Counts Outcome Treatment | ||
│ Float64 Cat… Cat… | ||
─────┼───────────────────────────── | ||
1 │ 18.0 1 1 | ||
2 │ 17.0 2 1 | ||
|
@@ -364,7 +367,7 @@ julia> dobson = DataFrame(Counts = [18.,17,15,20,10,21,25,13,13], | |
9 │ 13.0 3 3 | ||
|
||
julia> gm1 = fit(GeneralizedLinearModel, @formula(Counts ~ Outcome + Treatment), dobson, Poisson()) | ||
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Poisson{Float64}, LogLink}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}}}, Matrix{Float64}} | ||
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Poisson{Float64}, LogLink, UnitWeights{Int64}}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}, UnitWeights{Int64}}}, Matrix{Float64}} | ||
|
||
Counts ~ 1 + Outcome + Treatment | ||
|
||
|
@@ -390,29 +393,8 @@ In this example, we choose the best model from a set of λs, based on minimum BI | |
```jldoctest | ||
julia> using GLM, RDatasets, StatsBase, DataFrames, Optim | ||
|
||
julia> trees = DataFrame(dataset("datasets", "trees")) | ||
31×3 DataFrame | ||
Row │ Girth Height Volume | ||
│ Float64 Int64 Float64 | ||
─────┼────────────────────────── | ||
1 │ 8.3 70 10.3 | ||
2 │ 8.6 65 10.3 | ||
3 │ 8.8 63 10.2 | ||
4 │ 10.5 72 16.4 | ||
5 │ 10.7 81 18.8 | ||
6 │ 10.8 83 19.7 | ||
7 │ 11.0 66 15.6 | ||
8 │ 11.0 75 18.2 | ||
⋮ │ ⋮ ⋮ ⋮ | ||
25 │ 16.3 77 42.6 | ||
26 │ 17.3 81 55.4 | ||
27 │ 17.5 82 55.7 | ||
28 │ 17.9 80 58.3 | ||
29 │ 18.0 80 51.5 | ||
30 │ 18.0 80 51.0 | ||
31 │ 20.6 87 77.0 | ||
16 rows omitted | ||
|
||
julia> trees = DataFrame(dataset("datasets", "trees")); | ||
|
||
julia> bic_glm(λ) = bic(glm(@formula(Volume ~ Height + Girth), trees, Normal(), PowerLink(λ))); | ||
|
||
julia> optimal_bic = optimize(bic_glm, -1.0, 1.0); | ||
|
@@ -421,7 +403,7 @@ julia> round(optimal_bic.minimizer, digits = 5) # Optimal λ | |
0.40935 | ||
|
||
julia> glm(@formula(Volume ~ Height + Girth), trees, Normal(), PowerLink(optimal_bic.minimizer)) # Best model | ||
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Normal{Float64}, PowerLink}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}}}, Matrix{Float64}} | ||
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Normal{Float64}, PowerLink, UnitWeights{Int64}}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}, UnitWeights{Int64}}}, Matrix{Float64}} | ||
|
||
Volume ~ 1 + Height + Girth | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we are changing this I would recommend not to use randomly generated data in docs. Maybe e.g. use dataset III from Anscombe's quartet? (then users who know it will visually immediately see that the result is correct)