Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NLopt error for Bernoulli models with random slopes when NOT using fast=true #220

Closed
hikea opened this issue Nov 8, 2019 · 9 comments · Fixed by #231
Closed

NLopt error for Bernoulli models with random slopes when NOT using fast=true #220

hikea opened this issue Nov 8, 2019 · 9 comments · Fixed by #231

Comments

@hikea
Copy link

hikea commented Nov 8, 2019

Something strange is happening with Bernoulli models. Normal seems to run okay.

Also, unrelated to this issue, but the package cannot handle arrays with missing values out of the box.

using DataFrames, MixedModels, GLM, RDatasets, RData, RCall, Missings, CSV

lexdec = rcopy(R"languageR::lexdec")

fm = fit(MixedModel, @formula(Correct ~ 1 + Class*NativeLanguage*Frequency + Trial + (1+Class + Frequency + Trial|Subject) + (1|Word)), lexdec, Bernoulli())
ERROR: ArgumentError: invalid NLopt arguments: zero step size
Stacktrace:

julia> fm = fit!(GeneralizedLinearMixedModel(@formula(Correct ~ 1 + Class*NativeLanguage*Frequency + (1+Class + Frequency + Trial|Subject) + (1|Word)), lexdec, Bernoulli()))
ERROR: ArgumentError: invalid NLopt arguments: zero step size
Stacktrace:

julia> fm = fit!(GeneralizedLinearMixedModel(@formula(Correct ~ 1 + Class*NativeLanguage*Frequency + (1+Class + Frequency + Trial|Subject) + (1|Word)), lexdec, Bernoulli()), fast=true)
Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 1)
  Correct ~ 1 + Class + NativeLanguage + Frequency + Class & NativeLanguage + Class & Frequency + NativeLanguage & Frequency + Class & NativeLanguage & Frequency + (1 + Class + Frequency + Trial | Subject) + (1 | Word)
  Distribution: Bernoulli{Float64}
  Link: LogitLink()

  Deviance: 475.6604

Variance components:
             Column        Variance      Std.Dev.    Corr.
 Subject (Intercept)   1.8648165158988 1.365582848
         Class: plant  0.6393674729270 0.799604573 -0.49
         Frequency     0.1061836740609 0.325858365 -0.94  0.73
         Trial         0.0000076274775 0.002761789  0.93 -0.21 -0.75
 Word    (Intercept)   0.7313290831756 0.855177808
 Number of obs: 1659; levels of grouping factors: 21, 79

Fixed-effects parameters:
──────────────────────────────────────────────────────────────────────────────────────────
                                                   Estimate  Std.Error    z value  P(>|z|)
──────────────────────────────────────────────────────────────────────────────────────────
(Intercept)                                       -1.11513    0.839064  -1.32902    0.1838
Class: plant                                      -0.721878   1.14746   -0.629111   0.5293
NativeLanguage: Other                              0.913713   1.03829    0.88002    0.3788
Frequency                                         -0.495801   0.174133  -2.84725    0.0044
Class: plant & NativeLanguage: Other               1.20139    1.33366    0.900816   0.3677
Class: plant & Frequency                          -0.140558   0.274048  -0.512895   0.6080
NativeLanguage: Other & Frequency                 -0.118276   0.221975  -0.532835   0.5941
Class: plant & NativeLanguage: Other & Frequency  -0.263183   0.336882  -0.781232   0.4347
──────────────────────────────────────────────────────────────────────────────────────────

@palday
Copy link
Member

palday commented Nov 9, 2019

Duplicate of #201

@palday palday marked this as a duplicate of #201 Nov 9, 2019
@palday
Copy link
Member

palday commented Nov 9, 2019

Thanks for reporting this -- it's good to have a test case for the boundary issue that isn't dependent on using gamma regression, which is generally less well tested than logistic regression.

@palday
Copy link
Member

palday commented Nov 9, 2019

Also, unrelated to this issue, but the package cannot handle arrays with missing values out of the box.

The development version can handle missing values (by dropping them), but the more general problem is more or less inherited from JuliaStats/StatsModels.jl#17, JuliaStats/StatsModels.jl#145 and JuliaStats/StatsModels.jl#153. Please don't mix unrelated tidbits -- they distract or will be ignored. When we fix the boundary fit problem then this issue will be closed and that extra bit of info will disappear.

@hikea
Copy link
Author

hikea commented Nov 10, 2019

Thank you for the tip on missing - the development version did the trick.

About the model above. I think it is problematic model that should not converge. It should give warnings. This model doesn't converge in R with either glmer or glmmTMB. Both give warnings. On close examination, it looks like the random slope for Trial doesn't account for much variance (see below). rePCA from lme4 can also help to figure this out. So the model is overparametrized.

The model converges in Julia and in R with glmmTMB (but not either glmer) without the slope for Trial.

I remember dmbates objected the idea of giving the user warnings about convergence and potentially model misspecifications, but it seems that in this case it would have helped. Again, the problematic fit should be obvious based on the variance estimate for the random slope for Trial, but Julia ambiguously doesn't converge without fast=true. The user might be under the impression that the model is ok, but there is a problem with MixedModels.jl

> fm <- glmer(Correct ~ 1 + Class*NativeLanguage*Frequency + Trial + (1+Class + Frequency + Trial|Subject) + (1|Word), lexdec, family="binomial")
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 1.16397 (tol = 0.001, component 1)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model is nearly unidentifiable: large eigenvalue ratio
 - Rescale variables?

> summary(fm)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: Correct ~ 1 + Class * NativeLanguage * Frequency + Trial + (1 +  
    Class + Frequency + Trial | Subject) + (1 | Word)
   Data: lexdec

     AIC      BIC   logLik deviance df.resid 
   513.2    621.5   -236.6    473.2     1639 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.1224 -0.1865 -0.1193 -0.0731  7.7318 

Random effects:
 Groups  Name        Variance  Std.Dev. Corr             
 Word    (Intercept) 8.690e-01 0.932178                  
 Subject (Intercept) 1.345e+00 1.159747                  
         Classplant  7.804e-01 0.883404 -0.37            
         Frequency   9.266e-02 0.304398 -0.94  0.52      
         Trial       1.435e-05 0.003788  0.87 -0.13 -0.66
Number of obs: 1659, groups:  Word, 79; Subject, 21

Fixed effects:
                                          Estimate Std. Error z value Pr(>|z|)
(Intercept)                              -1.048216   1.423579  -0.736   0.4615
Classplant                               -0.916008   1.954159  -0.469   0.6392
NativeLanguageOther                       0.826676   1.553323   0.532   0.5946
Frequency                                -0.561306   0.295837  -1.897   0.0578
Trial                                    -0.001595   0.003758  -0.424   0.6713
Classplant:NativeLanguageOther            1.621867   2.082790   0.779   0.4362
Classplant:Frequency                     -0.164784   0.469251  -0.351   0.7255
NativeLanguageOther:Frequency            -0.090174   0.332856  -0.271   0.7865
Classplant:NativeLanguageOther:Frequency -0.350099   0.525350  -0.666   0.5051
                                          
(Intercept)                               
Classplant                                
NativeLanguageOther                       
Frequency                                .
Trial                                     
Classplant:NativeLanguageOther            
Classplant:Frequency                      
NativeLanguageOther:Frequency             
Classplant:NativeLanguageOther:Frequency  
---
Signif. codes:  0***0.001**0.01*0.05.0.1 ‘ ’ 1

Correlation of Fixed Effects:
            (Intr) Clsspl NtvLnO Frqncy Trial  Cl:NLO Clss:F NtLO:F
Classplant  -0.593                                                 
NtvLnggOthr -0.544  0.344                                          
Frequency   -0.928  0.607  0.542                                   
Trial       -0.225 -0.010 -0.016 -0.007                            
Clsspln:NLO  0.340 -0.603 -0.617 -0.364  0.032                     
Clssplnt:Fr  0.487 -0.927 -0.292 -0.517 -0.002  0.576              
NtvLnggOt:F  0.525 -0.346 -0.955 -0.571  0.010  0.618  0.303       
Clssp:NLO:F -0.273  0.559  0.485  0.303 -0.031 -0.924 -0.614 -0.502
convergence code: 0
Model failed to converge with max|grad| = 1.16397 (tol = 0.001, component 1)
Model is nearly unidentifiable: large eigenvalue ratio
 - Rescale variables?

> summary(rePCA(fm))
$Word
Importance of components:
                         [,1]
Standard deviation     0.9322
Proportion of Variance 1.0000
Cumulative Proportion  1.0000

$Subject
Importance of components:
                         [,1]   [,2]    [,3]      [,4]
Standard deviation     1.2727 0.7694 0.07979 0.0003709
Proportion of Variance 0.7303 0.2669 0.00287 0.0000000
Cumulative Proportion  0.7303 0.9971 1.00000 1.0000000


> fm2 <- glmmTMB(Correct ~ 1 + Class*NativeLanguage*Frequency + Trial + (1+Class + Frequency + Trial|Subject) + (1|Word), lexdec, family="binomial")
Warning messages:
1: In fitTMB(TMBStruc) :
  Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
2: In fitTMB(TMBStruc) :
  Model convergence problem; false convergence (8). See vignette('troubleshooting')

> summary(fm2)
 Family: binomial  ( logit )
Formula:          
Correct ~ 1 + Class * NativeLanguage * Frequency + Trial + (1 +  
    Class + Frequency + Trial | Subject) + (1 | Word)
Data: lexdec

     AIC      BIC   logLik deviance df.resid 
      NA       NA       NA       NA     1639 

Random effects:

Conditional model:
 Groups  Name        Variance  Std.Dev. Corr              
 Subject (Intercept) 2.415e+00 1.553987                   
         Classplant  8.742e-01 0.934975 -0.46             
         Frequency   1.403e-01 0.374617 -0.97  0.56       
         Trial       1.131e-05 0.003363  0.82 -0.14 -0.64 
 Word    (Intercept) 9.221e-01 0.960246                   
Number of obs: 1659, groups:  Subject, 21; Word, 79

Conditional model:
                                          Estimate Std. Error z value Pr(>|z|)
(Intercept)                              -1.049643   1.460669  -0.719   0.4724
Classplant                               -0.987369   2.013568  -0.490   0.6239
NativeLanguageOther                       0.992664   1.636702   0.606   0.5442
Frequency                                -0.574650   0.306207  -1.877   0.0606
Trial                                    -0.001363   0.003390  -0.402   0.6877
Classplant:NativeLanguageOther            1.424499   2.104225   0.677   0.4984
Classplant:Frequency                     -0.153430   0.480728  -0.319   0.7496
NativeLanguageOther:Frequency            -0.121959   0.349751  -0.349   0.7273
Classplant:NativeLanguageOther:Frequency -0.304610   0.528633  -0.576   0.5645
                                          
(Intercept)                               
Classplant                                
NativeLanguageOther                       
Frequency                                .
Trial                                     
Classplant:NativeLanguageOther            
Classplant:Frequency                      
NativeLanguageOther:Frequency             
Classplant:NativeLanguageOther:Frequency  
---
Signif. codes:  0***0.001**0.01*0.05.0.1 ‘ ’ 1

@palday
Copy link
Member

palday commented Nov 11, 2019

While parameterization issues can cause convergence issues, these are two different sets of problems. In particular, a random effect going to zero is something that lme4 and MixedModels.jl can deal with and that's a feature, not a bug. For such boundary fits, the gradient test isn't informative because the gradient being equal to zero is only a necessary condition for minima in the interior of a bounded parameter space. (There are also issues with the finite-differences approximation used by lme4, but this is a result of lme4 using a gradient-free method for fitting.) The eigenvalue test is more interesting; @dmbates may have a comment on whether or not we we should add it to a list of desirables.

glmmTMB uses a different fitting method (based on the gradient) and runs into a somewhat different convergence issue. A non positive-definite Hessian would also create problems for lme4, but because of the different convergence methods, the intermediate results are different and thus the two packages get stuck in different problem zones. That said, looking at the glmmTMB docs, problems with the Hessian can be the result of a boundary fit.

@hikea
Copy link
Author

hikea commented Nov 11, 2019

Thank you for clarifications!

@dmbates
Copy link
Collaborator

dmbates commented Dec 11, 2019

@hikea I have made changes to the optimization algorithm for GLMMs in the dropmulalphabeta branch of the package. Could you check your example after installing that branch as a development version? That is

using Pkg; Pkg.develop(PackageSpec(name="MixedModels", rev="dropmulalphabeta")

@hikea
Copy link
Author

hikea commented Dec 16, 2019

Can't get it working. Keep getting ERROR: git revision can not be given to develop`` Tried installing the master version, but that didn't help either.

@palday
Copy link
Member

palday commented Dec 17, 2019

@dmbates This example runs on the dropmulalphabeta branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants