-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Termination at saddle point #705
Comments
@andreasnoack Can you show us @dmbates has actually done a fair number of more involved experiments with using the gradient and it never really paid off. That said, we've recently discussed revisiting this and evaluating the Hessian efficiently has several uses. In my own experience, BOBYQA rarely fails on well-posed problems, but we do support using other derivative-free algorithms implemented in NLopt. For example, if we change to COBYLA, things work: N = 500
_rng = Random.seed!(Random.default_rng(), 124)
formula = @formula(log(AUCT) ~ trt + seq + per +
(trt + 0 | sub) +
zerocorr(trt + 0 | row))
contrasts = Dict(:trt => DummyCoding(), :per => DummyCoding(),
:sub => Grouping(), :row => Grouping())
fts = @showprogress map(1:N) do _
data = transform(saddlepointdata,
"AUCT" => ByRow(t -> t + randn(_rng)*1e-12) => "AUCT")
model = LinearMixedModel(formula, data; contrasts)
model.optsum.optimizer = :LN_COBYLA
return fit!(model; REML=true)
end
countmap(round.(exp.(getindex.(coef.(fts), 2)), digits=3)) yields Dict{Float64, Int64} with 1 entry:
0.933 => 500 |
We have five test dataset where we could reproduce the saddle point. When running the test script above with those datasets, It would be interesting to know if this an issue with the implementation of BOBYQA or an issue with the algorithm. cc @stevengj.
I'd be curious to read more about this. My expectation would be that it would be beneficial to use the gradient.
The reason I had the ForwardDiff compatible deviance code handy is actually that I'm using it for the Satterthwaite approximation for the degrees of freedom. I'd be happy to contribute that here if there is an interest. |
@andreasnoack definitely! That's been on my secret todo list for a very long time, but other things were always higher priority. |
@andreasnoack it's stale relative to |
@andreasnoack see also #312 |
As I mention in #742, I have been looking at switching MixedModels.jl to the PRIMA.jl version of I will let you know if I am able to apply PRIMA.bobyqa to this problem |
I'm facing an issue where a linear mixed model terminates at a saddle point for some datasets. I've tried to search for any prior discussion but was unable to find anything. The short version is that BOBYQA terminates at a saddle point where Optim's BFGS does not. I did a brief search for discussions of BOBYQA and saddle points but didn't find anything obvious. I'm wondering if it might be worthwhile to switch to a derivative based optimization algorithm.
Now to the details. I'm working on a bioequivalence testing problem and I don't have the luxury of choosing the model. The model is stated in SAS form in appendix E of https://www.fda.gov/media/70958/download and for most datasets, I'm matching the SAS results exactly with the MixedModels formulation below. However, for some datasets (looks like it's associated with missings), the MixedModels fits end up at saddle points. This dataset is used for the reproducer
saddlepointdata.csv
In the code below, I make tiny perturbations of the observations. Even though they are tiny, it's sufficient to make the difference between reaching an extremum and a saddle point (as I'll show further down)
I'm focusing on the second coefficient because it's the one of interest in the equivalence testing and it makes the output less verbose when restricting focus to a single coefficient. The result of the code above is
The first of these solutions is an extremum and the second is a saddle point. To see this, we can run
which gives
so we are clearly not at a minimum.
To dig a little deeper, I'm using a version of the objective function that allows for
ForwardDiff
. Find the code in the folded section belowForward compatible MixedModels code
Notice that this version of the objective function hasn't concentrated out
σ
. With this code, I can compute the Hessian which clearly shows the indefinitenessFinally, we can use the ForwardDiff compatible code to fit the model with BFGS using ForwardDiff based derivatives
so with BFGS, we never end up at the saddle point (for this dataset). Of course, we don't know if that is the case for other datasets but I wanted to share this observation to start the conversation. Maybe a gradient based optimization should be considered. It will of course require some work since the version I threw together here is rather allocating so can't be used right away but might still be useful as a starting point.
The text was updated successfully, but these errors were encountered: