Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

need support for soft feasibility constraints #30

Open
jpritikin opened this issue Feb 9, 2015 · 18 comments
Open

need support for soft feasibility constraints #30

jpritikin opened this issue Feb 9, 2015 · 18 comments

Comments

@jpritikin
Copy link
Contributor

At least for local gradient-based methods, OpenMx needs support for soft feasibility constraints expressed by returning NaN for the fit. We use these in situations where the constraint is not expected to be active at the MLE. For example, a model may assume a positive definite covariance matrix and return NaN when the matrix is not positive definite. It is probably possible to add an inequality constraint to ensure positive definiteness, but any plausible MLE will be positive definite. Therefore, it is much easier to score such points as NaN.

Another example is when we estimate confidence intervals using a profile likelihood. The fit function when estimating the confidence interval includes a squared distance from a target log likelihood value. Here we use a soft constraint such that large square distances (>100) are scored as NaN. The distance at the MLE should be zero so this constraint is never active at a valid MLE.

Our proprietary optimizer, NPSOL, can handle non-linear constraints but it also knows that we use NaN to express soft constraints. When NPSOL evaluates the fit and gets a NaN, it halves the step size and tries again. If this continues to fail after a few retries then it gives up. One way to implement a similar kind of functionality using NLOPT is to return std::numeric_limits::max(). However, NLOPT takes this fit at face value and it screws up the Hessian approximation (tested with SLSQP).

Is there some existing functionality to handle this kind of situation in NLOPT? Or are you willing to accept patches implementing it?

@stevengj
Copy link
Owner

This will depend a lot on the specific algorithm. Algorithms that are gradient-based (or which internally construct an approximate gradient, like COBLYA) will have a hard time with something like this—they need differentiability, not to mention continuity. Algorithms which don't exploit continuity, like ISRES or DIRECT, could definitely be modified to handle this, but those algorithms tend to converge much more slowly.

In your example of squared distance, the simplest thing would just to be to add a distance² ≤ 100 inequality constraint; I don't see why you need NaNs for this.

The positive-definiteness constraint is a bit harder to implement, because eigenvalues are not generally differentiable. (SDP solvers handle positive-definiteness constraints, of course, but they require specialized methods.) In that case, couldn't you just as easily write your matrix as A = B' * B and then put your degrees of freedom in B rather than A? That would guarantee a positive semidefinite A implicitly with no loss of generality.

@jpritikin
Copy link
Contributor Author

Algorithms that are gradient-based (or which internally construct an approximate gradient, like COBLYA) will have a hard time with something like this—they need differentiability, not to mention continuity.

I am not sure about COBLYA, but using NaN to signal a soft constraint is no problem BFGS or L-BFGS style algorithms. There is no expectation that the gradient can be evaluated close to the feasibility boundary. These soft constraints are not expected to be active near the MLE so this is not a problem in practice.

I don't see why you need NaNs for this.

Certainly it is possible to come up with real constraints that would have the same effect as our implicit soft constraints. The problem is that OpenMx has offered soft constraints for years. We need to maintain backward compatibility with modelling scripts written by a broad user community of researchers. We would really prefer not to fork NLOPT and make our own version just for this small feature.

@mhunter1
Copy link

What we really need is some method of telling one of NLOPT's optimizers (e.g. SLSQP, or L-BFGS) that it has wondered into a bad region of the solution space. For example, many of the models OpenMx runs use some variety of covariance matrix which is assumed to be positive definite (PD). We do not currently use constraints to ensure the covariance matrix stays PD. Rather, at various points we attempt a Cholesky decomposition of the covariance matrix and if it fails then we know it is not PD. When the covariance matrix is not PD, we signal NPSOL that the attempted solution vector (i.e. the current values of the estimated parameters) is definitely not right. The signal that works for NPSOL is a fit function value of NaN. When we give NPSOL a fit function value of NaN, it tries a smaller step in the same direction. Is there a similar signal we could send to an NLOPT optimizer?

Joshua has tried NaN, but this seems to kill NLOPT. He's also tried std::numeric_limits::max(), but NLOPT then takes that ridiculous number literally and it screws up the derivative estimates. Thanks for your help on this!

@jpritikin
Copy link
Contributor Author

For example, many of the models OpenMx runs use some variety of covariance matrix which is assumed to be positive definite (PD). We do not currently use constraints to ensure the covariance matrix stays PD.

I want to point out that we have no control over the model specification. Many of these models assume the behavior of our old proprietary NPSOL optimizer (a.k.a. e04ucf). Many of these models are published as supplementary online material in academic journals. We are trying to find some open-source optimizer that has equal or better performance compared to NPSOL on the same optimization problems so we can release OpenMx on CRAN.

@stevengj
Copy link
Owner

It's true that on algorithms that use line search with backtracking, you can just backtrack when a NaN is encountered. I'd be open to patches of this sort.

@stevengj
Copy link
Owner

Actually, it looks like some of the algorithms will work okay if you just return Inf if it strays into an infeasible region.

@jpritikin
Copy link
Contributor Author

Can you be more specific? Which algorithms did you examine?

@stevengj
Copy link
Owner

CCSA at least, I think. Also some of the derivative-free methods that don't "in their hearts" construct a derivative, like Subplx or DIRECT or ISRES or CRS.

@jpritikin
Copy link
Contributor Author

I'm curious to try this out. I tried with CCSA. Any idea what to try here to get it working?

fit: 1673.69
fit: 1673.69
grad:
-666.478
-930.557
-1789.43
CCSA dual converged in 2 iters to g=-6.77293e+23:
fit: nan
CCSA inner iteration: rho -> 10
CCSA dual converged in 2 iters to g=1673.69:
fit: nan
CCSA inner iteration: rho -> 100
CCSA dual converged in 2 iters to g=1673.69:
fit: nan
CCSA inner iteration: rho -> 1000
CCSA dual converged in 2 iters to g=1673.69:
fit: nan
CCSA inner iteration: rho -> 10000
CCSA dual converged in 2 iters to g=1673.69:
fit: nan
CCSA inner iteration: rho -> 100000
CCSA dual converged in 2 iters to g=1673.69:
fit: nan
CCSA inner iteration: rho -> 1e+06

The diagnostics say "nan" but I return infinity as the fit.

@stevengj
Copy link
Owner

@jpritikin this will only work if it's not an active constraint at the optimum, I think.

@jpritikin
Copy link
Contributor Author

this will only work if it's not an active constraint at the optimum, I think.

Of course. We can specify explicit constraints in OpenMx for constraints that might be active at the optimum.

Can I take some point between xprev and xcur if xcur doesn't work?

@stevengj
Copy link
Owner

It should be enough to just increase rho by a factor of 10 and run another inner iteration.

@jpritikin
Copy link
Contributor Author

It should be enough to just increase rho by a factor of 10 and run another inner iteration.

As you can see from the output, rho keeps going up and up with no progress. If I let it run, it does 100s of iterations like that.

@stevengj
Copy link
Owner

That's odd; with a larger rho, it should take smaller steps, from what I recall; I wonder why it is giving the same g? Can you print out where you objective is being evaluated and what it is returning?

@jpritikin
Copy link
Contributor Author

Looks like it jumps to HUGE_VAL and stays there,

Running Univariate Regression of y on x1
coord: 0.2
0.8
0.8
fit: 1673.69
fit: 1673.69
grad:
-666.478
-930.557
-1789.43
CCSA dual converged in 2 iters to g=-6.77293e+23:
coord: 2e+20
2e+20
2e+20
fit: nan
CCSA inner iteration: rho -> 10
CCSA dual converged in 2 iters to g=1673.69:
coord: 2e+20
2e+20
2e+20
fit: nan
CCSA inner iteration: rho -> 100

@jpritikin
Copy link
Contributor Author

FYI, I pushed everything to https://github.com/jpritikin/OpenMx/tree/nlopt

After make install, you could run this yourself with,

IMX_OPT_ENGINE=NLOPT R --no-save -f inst/models/passing/IntroSEM-UnivariateStd.R

I copied the necessary nlopt source code directly into the OpenMx project just to ease debugging.

jpritikin added a commit to jpritikin/nlopt that referenced this issue May 12, 2015
…ngj#30)

+ Rewrite convergence monitoring in terms of structures

+ Unconfuse code that returns the converged details

+ Rename 'x' to 'theSpot'. I loath single letter variable names. X marks
the spot.

+ Preserve starting value if no evaluations were finite

+ Separately track the best minor and major estimate. This change
ensures that we always move to the best point found during the line
search even when the best point is not the last point. We also avoid 1
fit and gradient evaluation by moving the major iteration convergence
check above the switch statement.

+ Treat failure to find a search direction as convergence

+ Only update minor during line search
jpritikin added a commit to jpritikin/nlopt that referenced this issue May 12, 2015
…ngj#30)

+ Rewrite convergence monitoring in terms of structures

+ Unconfuse code that returns the converged details

+ Rename 'x' to 'theSpot'. I loath single letter variable names. X marks
the spot.

+ Preserve starting value if no evaluations were finite

+ Separately track the best minor and major estimate. This change
ensures that we always move to the best point found during the line
search even when the best point is not the last point. We also avoid 1
fit and gradient evaluation by moving the major iteration convergence
check above the switch statement.

+ Treat failure to find a search direction as convergence

+ Only update minor during line search
@jpritikin
Copy link
Contributor Author

Steven, do you think BOBYQA could be modified to cope with soft feasibility constraints?

jpritikin added a commit to jpritikin/nlopt that referenced this issue Nov 14, 2020
+ Preserve starting value if no evaluations were finite

+ Separately track the best minor and major estimate. This change
  ensures that we always move to the best point found during the line
  search even when the best point is not the last point. We also avoid 1
  fit and gradient evaluation by moving the major iteration convergence
  check above the switch statement.

+ Treat failure to find a search direction as convergence

+ Only update minor during line search

+ Unconfuse code that returns the converged details

+ Rename 'x' to 'theSpot'. I loath single letter variable names. X marks
  the spot.
@jpritikin
Copy link
Contributor Author

Yo, anybody home? Can we get my pull request merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants