need support for soft feasibility constraints #30

jpritikin · 2015-02-09T21:47:25Z

At least for local gradient-based methods, OpenMx needs support for soft feasibility constraints expressed by returning NaN for the fit. We use these in situations where the constraint is not expected to be active at the MLE. For example, a model may assume a positive definite covariance matrix and return NaN when the matrix is not positive definite. It is probably possible to add an inequality constraint to ensure positive definiteness, but any plausible MLE will be positive definite. Therefore, it is much easier to score such points as NaN.

Another example is when we estimate confidence intervals using a profile likelihood. The fit function when estimating the confidence interval includes a squared distance from a target log likelihood value. Here we use a soft constraint such that large square distances (>100) are scored as NaN. The distance at the MLE should be zero so this constraint is never active at a valid MLE.

Our proprietary optimizer, NPSOL, can handle non-linear constraints but it also knows that we use NaN to express soft constraints. When NPSOL evaluates the fit and gets a NaN, it halves the step size and tries again. If this continues to fail after a few retries then it gives up. One way to implement a similar kind of functionality using NLOPT is to return std::numeric_limits::max(). However, NLOPT takes this fit at face value and it screws up the Hessian approximation (tested with SLSQP).

Is there some existing functionality to handle this kind of situation in NLOPT? Or are you willing to accept patches implementing it?

stevengj · 2015-02-10T22:00:14Z

This will depend a lot on the specific algorithm. Algorithms that are gradient-based (or which internally construct an approximate gradient, like COBLYA) will have a hard time with something like this—they need differentiability, not to mention continuity. Algorithms which don't exploit continuity, like ISRES or DIRECT, could definitely be modified to handle this, but those algorithms tend to converge much more slowly.

In your example of squared distance, the simplest thing would just to be to add a distance² ≤ 100 inequality constraint; I don't see why you need NaNs for this.

The positive-definiteness constraint is a bit harder to implement, because eigenvalues are not generally differentiable. (SDP solvers handle positive-definiteness constraints, of course, but they require specialized methods.) In that case, couldn't you just as easily write your matrix as A = B' * B and then put your degrees of freedom in B rather than A? That would guarantee a positive semidefinite A implicitly with no loss of generality.

jpritikin · 2015-02-11T00:46:40Z

Algorithms that are gradient-based (or which internally construct an approximate gradient, like COBLYA) will have a hard time with something like this—they need differentiability, not to mention continuity.

I am not sure about COBLYA, but using NaN to signal a soft constraint is no problem BFGS or L-BFGS style algorithms. There is no expectation that the gradient can be evaluated close to the feasibility boundary. These soft constraints are not expected to be active near the MLE so this is not a problem in practice.

I don't see why you need NaNs for this.

Certainly it is possible to come up with real constraints that would have the same effect as our implicit soft constraints. The problem is that OpenMx has offered soft constraints for years. We need to maintain backward compatibility with modelling scripts written by a broad user community of researchers. We would really prefer not to fork NLOPT and make our own version just for this small feature.

mhunter1 · 2015-02-11T19:54:19Z

What we really need is some method of telling one of NLOPT's optimizers (e.g. SLSQP, or L-BFGS) that it has wondered into a bad region of the solution space. For example, many of the models OpenMx runs use some variety of covariance matrix which is assumed to be positive definite (PD). We do not currently use constraints to ensure the covariance matrix stays PD. Rather, at various points we attempt a Cholesky decomposition of the covariance matrix and if it fails then we know it is not PD. When the covariance matrix is not PD, we signal NPSOL that the attempted solution vector (i.e. the current values of the estimated parameters) is definitely not right. The signal that works for NPSOL is a fit function value of NaN. When we give NPSOL a fit function value of NaN, it tries a smaller step in the same direction. Is there a similar signal we could send to an NLOPT optimizer?

Joshua has tried NaN, but this seems to kill NLOPT. He's also tried std::numeric_limits::max(), but NLOPT then takes that ridiculous number literally and it screws up the derivative estimates. Thanks for your help on this!

jpritikin · 2015-02-11T20:21:15Z

For example, many of the models OpenMx runs use some variety of covariance matrix which is assumed to be positive definite (PD). We do not currently use constraints to ensure the covariance matrix stays PD.

I want to point out that we have no control over the model specification. Many of these models assume the behavior of our old proprietary NPSOL optimizer (a.k.a. e04ucf). Many of these models are published as supplementary online material in academic journals. We are trying to find some open-source optimizer that has equal or better performance compared to NPSOL on the same optimization problems so we can release OpenMx on CRAN.

stevengj · 2015-02-12T16:57:47Z

It's true that on algorithms that use line search with backtracking, you can just backtrack when a NaN is encountered. I'd be open to patches of this sort.

stevengj · 2015-04-23T14:49:03Z

Actually, it looks like some of the algorithms will work okay if you just return Inf if it strays into an infeasible region.

jpritikin · 2015-04-23T14:51:42Z

Can you be more specific? Which algorithms did you examine?

stevengj · 2015-04-23T15:43:48Z

CCSA at least, I think. Also some of the derivative-free methods that don't "in their hearts" construct a derivative, like Subplx or DIRECT or ISRES or CRS.

jpritikin · 2015-04-23T16:15:58Z

I'm curious to try this out. I tried with CCSA. Any idea what to try here to get it working?

fit: 1673.69
fit: 1673.69
grad:
-666.478
-930.557
-1789.43
CCSA dual converged in 2 iters to g=-6.77293e+23:
fit: nan
CCSA inner iteration: rho -> 10
CCSA dual converged in 2 iters to g=1673.69:
fit: nan
CCSA inner iteration: rho -> 100
CCSA dual converged in 2 iters to g=1673.69:
fit: nan
CCSA inner iteration: rho -> 1000
CCSA dual converged in 2 iters to g=1673.69:
fit: nan
CCSA inner iteration: rho -> 10000
CCSA dual converged in 2 iters to g=1673.69:
fit: nan
CCSA inner iteration: rho -> 100000
CCSA dual converged in 2 iters to g=1673.69:
fit: nan
CCSA inner iteration: rho -> 1e+06

The diagnostics say "nan" but I return infinity as the fit.

stevengj · 2015-04-23T16:18:55Z

@jpritikin this will only work if it's not an active constraint at the optimum, I think.

jpritikin · 2015-04-23T16:21:36Z

this will only work if it's not an active constraint at the optimum, I think.

Of course. We can specify explicit constraints in OpenMx for constraints that might be active at the optimum.

Can I take some point between xprev and xcur if xcur doesn't work?

stevengj · 2015-04-23T16:43:33Z

It should be enough to just increase rho by a factor of 10 and run another inner iteration.

jpritikin · 2015-04-23T17:51:01Z

It should be enough to just increase rho by a factor of 10 and run another inner iteration.

As you can see from the output, rho keeps going up and up with no progress. If I let it run, it does 100s of iterations like that.

stevengj · 2015-04-23T18:22:55Z

That's odd; with a larger rho, it should take smaller steps, from what I recall; I wonder why it is giving the same g? Can you print out where you objective is being evaluated and what it is returning?

jpritikin · 2015-04-23T18:37:33Z

Looks like it jumps to HUGE_VAL and stays there,

Running Univariate Regression of y on x1
coord: 0.2
0.8
0.8
fit: 1673.69
fit: 1673.69
grad:
-666.478
-930.557
-1789.43
CCSA dual converged in 2 iters to g=-6.77293e+23:
coord: 2e+20
2e+20
2e+20
fit: nan
CCSA inner iteration: rho -> 10
CCSA dual converged in 2 iters to g=1673.69:
coord: 2e+20
2e+20
2e+20
fit: nan
CCSA inner iteration: rho -> 100

jpritikin · 2015-04-23T19:09:15Z

FYI, I pushed everything to https://github.com/jpritikin/OpenMx/tree/nlopt

After make install, you could run this yourself with,

IMX_OPT_ENGINE=NLOPT R --no-save -f inst/models/passing/IntroSEM-UnivariateStd.R

I copied the necessary nlopt source code directly into the OpenMx project just to ease debugging.

…ngj#30) + Rewrite convergence monitoring in terms of structures + Unconfuse code that returns the converged details + Rename 'x' to 'theSpot'. I loath single letter variable names. X marks the spot. + Preserve starting value if no evaluations were finite + Separately track the best minor and major estimate. This change ensures that we always move to the best point found during the line search even when the best point is not the last point. We also avoid 1 fit and gradient evaluation by moving the major iteration convergence check above the switch statement. + Treat failure to find a search direction as convergence + Only update minor during line search

jpritikin · 2016-07-29T20:50:06Z

Steven, do you think BOBYQA could be modified to cope with soft feasibility constraints?

+ Preserve starting value if no evaluations were finite + Separately track the best minor and major estimate. This change ensures that we always move to the best point found during the line search even when the best point is not the last point. We also avoid 1 fit and gradient evaluation by moving the major iteration convergence check above the switch statement. + Treat failure to find a search direction as convergence + Only update minor during line search + Unconfuse code that returns the converged details + Rename 'x' to 'theSpot'. I loath single letter variable names. X marks the spot.

jpritikin · 2020-11-21T14:16:10Z

Yo, anybody home? Can we get my pull request merged?

jpritikin mentioned this issue Mar 19, 2015

soft feasibility constraints for slsqp #34

Merged

stevengj mentioned this issue May 12, 2015

Back away from inequality constraints that evaluate to NaN #38

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

need support for soft feasibility constraints #30

need support for soft feasibility constraints #30

jpritikin commented Feb 9, 2015

stevengj commented Feb 10, 2015

jpritikin commented Feb 11, 2015

mhunter1 commented Feb 11, 2015

jpritikin commented Feb 11, 2015

stevengj commented Feb 12, 2015

stevengj commented Apr 23, 2015

jpritikin commented Apr 23, 2015

stevengj commented Apr 23, 2015

jpritikin commented Apr 23, 2015

stevengj commented Apr 23, 2015

jpritikin commented Apr 23, 2015

stevengj commented Apr 23, 2015

jpritikin commented Apr 23, 2015

stevengj commented Apr 23, 2015

jpritikin commented Apr 23, 2015

jpritikin commented Apr 23, 2015

jpritikin commented Jul 29, 2016

jpritikin commented Nov 21, 2020

need support for soft feasibility constraints #30

need support for soft feasibility constraints #30

Comments

jpritikin commented Feb 9, 2015

stevengj commented Feb 10, 2015

jpritikin commented Feb 11, 2015

mhunter1 commented Feb 11, 2015

jpritikin commented Feb 11, 2015

stevengj commented Feb 12, 2015

stevengj commented Apr 23, 2015

jpritikin commented Apr 23, 2015

stevengj commented Apr 23, 2015

jpritikin commented Apr 23, 2015

stevengj commented Apr 23, 2015

jpritikin commented Apr 23, 2015

stevengj commented Apr 23, 2015

jpritikin commented Apr 23, 2015

stevengj commented Apr 23, 2015

jpritikin commented Apr 23, 2015

jpritikin commented Apr 23, 2015

jpritikin commented Jul 29, 2016

jpritikin commented Nov 21, 2020