Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate F-statistics for Tests of Between-Subjects Effects (Type III, ANOVA) #508

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

PharmCat
Copy link

@PharmCat PharmCat commented Nov 5, 2022

This is draft PR.

This test can be used by users who want ANOVA and can't use F-test for some data (as in my case with crossover designs).

Tests of Between-Subjects Effects included in this PR as it described in SPSS:

F =β' L' (L V L')⁻¹ β L / rank(L)

image

where V - variance-covariance matrix of β

L-matrix description from SPSS docs (SPSS and SAS docs avialible in public access):

For each level combination of the between subjects factors in TABLES, identify the nonmissing
cases with positive caseweights and positive regression weights which are associated with the
current level combination. Suppose the cases are classified by three between-subjects factors:
A, B and C. Now A and B are specified in TABLES and the current level combination is A=1
and B=2. A case in the cell A=1, B=2, and C=3 is associated with the current level combination,
whereas a case in the cell A=1, B=3 and C=3 is not. Compute the average of the design matrix
rows corresponding to these cases.
If an effect contains a covariate, then its parameters which belong to the current level
combination are equal to the mean of the covariate, and are equal to 0 otherwise. Using the above
example, for effect A*X where X is a covariate, the parameter [A=1]*X belongs to the current
level combination where the parameter [A=2]*X does not. If the effect contains a product of
covariates, then the mean of the product is applied.
The result is the l vector for the current between-subjects factor level combination. When none
of the between-subjects effects contain covariates, the vector always forms an estimable function.
Otherwise, a non-estimable function may occur, depending on the data.

For the inverse of (L V L) used pinv because some of V matrices are ill-conditioned.

Because V is Symmetric - I try to calculate it in place and more efficiently.

After calculation F for each effect p values are calculated too.

ccdf(FDist(df[i], dof_residual(obj)), F[i])

where df - is rank(L)

For Intercept factor I think the general mean should be used, but because I can't get the number of efficient levels for InteractionTerm from StatsModel - it is not realized.

Also for InteractionTerm it may be some issues for zero-intercept models because I don't know how to get completed contrast matrix for this.

PrettyTables is used for printing because this is draft PR, it can be removed by custom output.

model used for check:

using  DataFrames, CSV, StatsBase, Plots, GLM, CategoricalArrays

dffinal = CSV.File("df1.csv") |> DataFrame
transform!(dffinal, :period => categorical, renamecols = false)
transform!(dffinal, :id => categorical, renamecols = false)

ols = lm(@formula(logAUC ~ sequence+period+formulation), 
dffinal, contrasts  = Dict(:period => DummyCoding(base = 4),
:sequence => DummyCoding(base = "TRRT"),
:formulation => DummyCoding(base = "T")))
GLM.typeiii(ols)

result:

 ------------- ----- ----------- -------------
         Name    DF           F          Pval 
 ------------- ----- ----------- -------------
  (Intercept)   1.0     14250.2   5.16248e-74
     sequence   1.0     3.37076     0.0712381
       period   3.0   0.0282538      0.993531
  formulation   1.0    0.527962      0.470244
 ------------- ----- ----------- -------------

SPSS result:

image

data used for check:

id,period,sequence,formulation,AUC,logAUC,Cmax,logCmax
1,1,RTTR,R,10671,9.275285060616564,817,6.705639094860003
1,4,RTTR,R,11206,9.32420462812523,1502,7.31455283232408
1,2,RTTR,T,12772,9.455010553834672,1439,7.271703706887368
1,3,RTTR,T,13151,9.484253080341965,1310,7.177782416195197
2,2,TRRT,R,6068,8.710784340468424,1372,7.22402480828583
2,3,TRRT,R,5996,8.698847859222488,1056,6.962243464266207
2,1,TRRT,T,6518,8.782322859397507,1393,7.239214973779806
2,4,TRRT,T,5844,8.67317077287059,1310,7.177782416195197
3,2,TRRT,R,5728,8.653121708640482,1377,7.227662498728654
3,3,TRRT,R,5760,8.658692753689937,1529,7.332369205929062
3,1,TRRT,T,4939,8.504918160540624,1481,7.300472814267799
3,4,TRRT,T,6313,8.750366278367625,781,6.660575149839686
4,1,RTTR,R,7588,8.934323331054905,823,6.71295620067707
4,4,RTTR,R,7654,8.942983665985647,1095,6.9985096422506015
4,2,RTTR,T,8980,9.102755161296246,1133,7.0326242610280065
4,3,RTTR,T,8408,9.036938912556787,1065,6.970730078143525
5,2,TRRT,R,8022,8.989943046329998,1035,6.942156705699469
5,3,TRRT,R,10721,9.27995971385624,1571,7.359467638255621
5,1,TRRT,T,7653,8.942853006809921,709,6.5638555265321274
5,4,TRRT,T,8043,8.99255742690407,1342,7.201916317531627
6,1,RTTR,R,8389,9.034676602846295,1347,7.205635176410364
6,4,RTTR,R,7616,8.938006576471201,1153,7.050122520269059
6,2,RTTR,T,7949,8.980801413573113,691,6.53813982376767
6,3,RTTR,T,7735,8.953510763007166,949,6.855408798609928
7,1,RTTR,R,5161,8.548885638148727,1278,7.15305163493748
7,4,RTTR,R,4764,8.46884293047519,1040,6.946975992135418
7,2,RTTR,T,6601,8.79497643168877,991,6.898714534329988
7,3,RTTR,T,5479,8.608677881538416,1124,7.024649030453636
8,2,TRRT,R,8026,8.990441550826862,1242,7.124478262493424
8,3,TRRT,R,6776,8.821142236331891,1090,6.993932975223189
8,1,TRRT,T,8864,9.089753408987065,1516,7.323830566202317
8,4,TRRT,T,6995,8.852950887099581,1048,6.954638864880987
9,1,RTTR,R,7399,8.90910013492555,1547,7.344072850573066
9,4,RTTR,R,7211,8.88336291691676,1485,7.3031700512368
9,2,RTTR,T,7873,8.971194463184467,1361,7.215975002651466
9,3,RTTR,T,8153,9.006141236662911,1380,7.22983877815125
10,1,RTTR,R,5660,8.641179171197228,1088,6.992096427415888
10,4,RTTR,R,5076,8.532278828834277,796,6.679599185844383
10,2,RTTR,T,4858,8.488382109562117,982,6.889591308354466
10,3,RTTR,T,5347,8.584290934948731,995,6.902742737158593
11,2,TRRT,R,7730,8.952864141581468,908,6.811244378601294
11,3,TRRT,R,8228,9.015298250772847,1183,7.075808863978387
11,1,TRRT,T,8503,9.048174321385792,999,6.906754778648554
11,4,TRRT,T,8032,8.99118884193151,1129,7.029087564149662
12,2,TRRT,R,6007,8.700680734850161,1220,7.106606137727303
12,3,TRRT,R,7737,8.953769294549454,776,6.654152520183219
12,1,TRRT,T,7043,8.85978949474541,679,6.520621127558696
12,4,TRRT,T,6262,8.742254901886351,1258,7.1372784372603855
13,2,TRRT,R,5767,8.659907293615413,869,6.767343125265392
13,3,TRRT,R,5942,8.689801056022553,921,6.825460036255307
13,1,TRRT,T,5701,8.648396877031582,822,6.71174039505618
13,4,TRRT,T,7757,8.956350940490871,947,6.853299093186078
14,2,TRRT,R,7858,8.969287400118406,1451,7.280008252884188
14,3,TRRT,R,7924,8.977651407818442,1389,7.236339342754344
14,1,TRRT,T,8684,9.069237530998182,615,6.421622267806518
14,4,TRRT,T,9219,9.129021850798594,1279,7.153833801578843
15,1,RTTR,R,6937,8.844624683385302,953,6.859614903654202
15,4,RTTR,R,7515,8.924656302187074,1247,7.1284959456800365
15,2,RTTR,T,7905,8.975250749643573,1065,6.970730078143525
15,3,RTTR,T,6550,8.787220328629298,830,6.721425700790643
16,1,RTTR,R,11473,9.347751727799142,1368,7.221105098182496
16,4,RTTR,R,10365,9.24619002486988,1418,7.257002707092073
16,2,RTTR,T,9698,9.179674957665297,1281,7.155396301896734
16,3,RTTR,T,10355,9.245224773829685,1083,6.9874902470009905
18,2,TRRT,R,5120,8.540909718033554,842,6.7357800142423265
18,3,TRRT,R,5420,8.597851094433691,1176,7.069874128458572
18,1,TRRT,T,5210,8.558335134747413,668,6.504288173536645

PS nested factors are not implemented because they are not implemented in StatsModels. I think SS and MS can be calculated too. The main problem - is a stable method to get real numbers of levels for InteractionTerm.

@codecov-commenter
Copy link

codecov-commenter commented Nov 6, 2022

Codecov Report

Base: 88.75% // Head: 89.38% // Increases project coverage by +0.63% 🎉

Coverage data is based on head (ff47219) compared to base (b728917).
Patch coverage: 97.53% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #508      +/-   ##
==========================================
+ Coverage   88.75%   89.38%   +0.63%     
==========================================
  Files           8        8              
  Lines        1040     1121      +81     
==========================================
+ Hits          923     1002      +79     
- Misses        117      119       +2     
Impacted Files Coverage Δ
src/ftest.jl 98.80% <97.53%> (-1.20%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@PharmCat
Copy link
Author

PharmCat commented Nov 6, 2022

add test for coverage, reference dataset 1 from Schütz, Helmut & Labes, Detlew & Fuglsang, Anders. (2014). Reference Datasets for 2-Treatment, 2-Sequence, 2-Period Bioequivalence Studies. The AAPS journal. 16. 1292-1297. 10.1208/s12248-014-9661-0.

src/ftest.jl Outdated Show resolved Hide resolved
src/ftest.jl Outdated
L-contrast matrix for `i` fixed effect.
"""
function lcontrast(obj, i::Int)
n = length(obj.mf.f.rhs.terms)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much of this function relies very heavily on the internal structures of the types from StatsModels. If any internal structures change in StatsModels (which is typically not considered a breaking change in the Julia world) this function will break. StatsModels provides a variety of functions with methods to access these quantities in a stable way; this function should be refactored to use those.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I can't find public API from StatsModel documentation to deep work with terms. Can you kindly point me for that ways...

src/ftest.jl Outdated Show resolved Hide resolved
@PharmCat
Copy link
Author

Maybe lcontrast function can be simplified with using StatsModels.hypothesis_matrix, but anyway it is not working with InteractionTerm.

@PharmCat
Copy link
Author

PharmCat commented Feb 2, 2023

Hi! Maybe if this PR not appropriate to GLM - make it as additional package?

@bkamins
Copy link
Contributor

bkamins commented Feb 2, 2023

I have not followed the whole discussion, but I think we should not create a separate package. Rather it would be better to think of the best package to place it into.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants