Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oxide example model fit may be an inappropriate model #743

Open
dmbates opened this issue Feb 18, 2024 · 1 comment
Open

Oxide example model fit may be an inappropriate model #743

dmbates opened this issue Feb 18, 2024 · 1 comment

Comments

@dmbates
Copy link
Collaborator

dmbates commented Feb 18, 2024

One of the tests in test/pls.jl uses the :oxide models defined in test/modelcache.jl

    :oxide => [@formula(Thickness ~ 1 + (1|Lot/Wafer)),
               @formula(Thickness ~ 1 + Source + (1+Source|Lot) + (1+Source|Lot&Wafer))],

The second model is notoriously hard to fit. Different optimizers give very different parameter estimates but with similar values of the objective. I think this is because the model is ill-defined as Source is constant within each Lot.

julia> groupby(DataFrame(MixedModels.dataset(:oxide)), [:Source, :Lot])
GroupedDataFrame with 8 groups based on keys: Source, Lot
First Group (9 rows): Source = "1", Lot = "1"
 Row │ Source  Lot     Wafer   Site    Thickness 
     │ String  String  String  String  Float64   
─────┼───────────────────────────────────────────
   11       1       1       1          2006.0
   21       1       1       2          1999.0
   31       1       1       3          2007.0
   41       1       2       1          1980.0
   51       1       2       2          1988.0
   61       1       2       3          1982.0
   71       1       3       1          2000.0
   81       1       3       2          1998.0
   91       1       3       3          2007.0

Last Group (9 rows): Source = "2", Lot = "8"
 Row │ Source  Lot     Wafer   Site    Thickness 
     │ String  String  String  String  Float64   
─────┼───────────────────────────────────────────
   12       8       1       1          1996.0
   22       8       1       2          1989.0
   32       8       1       3          1996.0
   42       8       2       1          1997.0
   52       8       2       2          1993.0
   62       8       2       3          1996.0
   72       8       3       1          1990.0
   82       8       3       2          1989.0
   92       8       3       3          1992.0

To me this means that you can't expect to fit a random-effects term like (1 + Source|Lot).

Am I confusing myself?

@dmbates
Copy link
Collaborator Author

dmbates commented Feb 18, 2024

If you use a contrast like EffectsCoding() for Source then the conditional means of the random effects for the term (1 + Source | Lot & Wafer) end up being close to multiples of [1, -1] for the first level of Source and [1, 1] for the second level.

julia> first(m2.b)
2×24 Matrix{Float64}:
  3.70686  -5.46891   2.67088  -0.979013  -1.71899  -2.75497    -0.997343  -0.39882   -0.0995575  -0.178385  0.56977   -1.67469
 -3.72693   5.49852  -2.68534   0.984313   1.7283    2.76988     -1.00222   -0.400768  -0.100044   -0.179257  0.572553  -1.68288

I'm not sure if that is a consequence of an unstable model or of only having 3 observations for each Lot & Wafer combination.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant