Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow initial-guess methods to solve for lambda #14

Merged
merged 4 commits into from
Jan 30, 2022

Conversation

jjstickel
Copy link
Contributor

I've implemented some changes to solve(Psi, b; kwargs...) so that the user may specify the optimization method to use to solve for lambda. I also changed the objective functions to solve for log10(lambda), which I think speeds it up a little, and also lambda does not mistakenly become negative when used with an initial-guess solver (rather than a bounded solver). Example usage:

result = RT.solve(Ψ, b, alg=:gcv_svd, method=RT.NelderMead(), λ₀=1e-1)
result = RT.solve(Ψ, b, alg=:gcv_svd, method=RT.Brent(), rel_tol=1e-3)

Let me know what you think. AFAICT, everything is backward compatible. If you like the approach, I will also do the same for solve(Psi, b, x0; kwargs...), and update docs and tests.

@mdpetters
Copy link
Owner

I think this is a great extension. However, I wonder if we should maintain the

function gcv_tr::RegularizationProblem, b̄::AbstractVector, λ::AbstractFloat)

function. If we want/need one with log10 initialization, maybe we can just add

 gcv_tr::RegularizationProblem, b̄::AbstractVector, log10λ::AbstractFloat) =  gcv_tr(Ψ, b̄, log10(λ))

which would maintain full backward compatibility. That way the docs won't break either, which is a plus. Do you agree?

@jjstickel
Copy link
Contributor Author

From what I could tell, the only use for gcv_tr() (also gcv_svd() and kappa()), at least internally within RegularizationTools, is to find the optimal regularization parameter. But if you have another use case and want to call gcv_tr(Ψ, b̄, λ), I can certainly keep that function as is and introduce a new function that takes log10(λ).

@mdpetters
Copy link
Owner

It's mentioned here to help with diagnostics: https://mdpetters.github.io/RegularizationTools.jl/stable/manual/#Extracting-the-Validation-Function. It's of course not a big deal to change it in the docs. I doubt that anything relies on it.

@jjstickel
Copy link
Contributor Author

It's mentioned here to help with diagnostics: https://mdpetters.github.io/RegularizationTools.jl/stable/manual/#Extracting-the-Validation-Function. It's of course not a big deal to change it in the docs. I doubt that anything relies on it.

OIC, in that example you calculate and observe the shape of the GCV curve. That could be potentially useful for other problems, even just for illustration. I'll revert those few changes and create separate functions ones that take log10(λ) as the argument.

@jjstickel jjstickel changed the title WIP: allow initial-guess methods to solve for lambda Allow initial-guess methods to solve for lambda Jan 29, 2022
@jjstickel
Copy link
Contributor Author

This is ready for another review.

Note that I did run the tests after these changes, and I am finding failures for some (not all) the tests in test/invert.jl. After spending a bit of time looking into it, I can't find any reason why those should fail when all the other tests, including test/solvers.jl pass. I'm wondering if those particular tests are highly sensitive to noise in the input conditions? Or I might be encountering some version mismatch with my packages, but I checked that too.

@mdpetters
Copy link
Owner

The test problem is relating size distribution to optical properties of particles and is highly sensitive to noise. Optimizing over log10(λ) instead of λ will give you a different optimum value. (See below). I am not sure in which cases convergence is better using log10(λ) instead of λ. It seems that for this case it is slightly less efficient, though it doesn't matter in a practical sense. Note that I have not evaluated which solution is the better reconstruction. I also haven't looked at the shape of the GCV curve; it might just converge to a different local minimum.

If you have a good mathematical argument that using log10 should be used by default I am open to leave your code as is. We can update the test values to reflect this. If there isn't a strong argument, then maybe it could be a switch in solve(; optimize_log10 = false) and users can toggle this feature? For now I am agnostic about the default behavior.

Solving the test problem, optimizing over log10(λ) (This PR)

λ1 = @> setupRegularizationProblem(A,2) solve(b, alg=:gcv_svd)
RegularizedSolution([-42.861970060220465, 41.38896718516822, -8.573653938095891, 5.522046816915601, 20.859167079491346, 35.4922110655504, 24.717436645400802, 9.08300045288385], 5.7326209557572465, Results of Optimization Algorithm
 * Algorithm: Brent's Method
 * Search Interval: [-4.000000, 3.000000]
 * Minimizer: 7.583532e-01
 * Minimum: 2.235460e+06
 * Iterations: 16
 * Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): true
 * Objective Function Calls: 17)

Solving the test problem, optimizing over λ

RegularizedSolution([3.6374824645269612, 3.2553725649550813, 3.490608071270156, 7.471743184367153, 19.29326374545647, 35.65012377616612, 24.738905782214346, 9.078797701339127], 575.5703218708263, Results of Optimization Algorithm
 * Algorithm: Brent's Method
 * Search Interval: [0.000100, 1000.000000]
 * Minimizer: 5.755703e+02
 * Minimum: 2.538607e+06
 * Iterations: 12
 * Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): true
 * Objective Function Calls: 13)

@jjstickel
Copy link
Contributor Author

Yes, indeed, there are two local minima for your test problem:
GCV_lin.pdf
GCV_semilog.pdf
The first figure uses a linear axis for lambda while the second uses a log10 axis. It is easy to see what is happening: when the optimizer method uses linear scaling for lambda, it mistakenly skips over the lower minimum. There might be an option for the Brent method to use higher resolution in its search to help it find this minimum (I'm not sure). The second figure uses a log axis for lambda. With log scaling, the optimizer method is able to find the global minimum (10^7.583532e-01 = 5.73262).

As an engineer, I do not have a rigorous mathematical argument, and I may even be abusing vocabulary, but clearly the GCV response to lambda happens more evenly in log-space. I discovered this when previously working on regularization for data-smoothing applications. A secondary motivation is that lambda stays positive when using unbounded optimization methods (e.g. NelderMead) with log10(lambda). I noticed I was getting negative values before I implemented this PR, an artifact of defining the regularization parameter as a squared term -- while easy to correct (just take the absolute value of the minimizer), it means the solver was traversing into negative space which is unnecessary and inefficient.

@mdpetters
Copy link
Owner

I think this is a good argument to make log10 search the default. I will merge and update the test values. If anyone disagrees with this choice, please reopen the discussion.

@mdpetters mdpetters closed this Jan 30, 2022
@mdpetters mdpetters reopened this Jan 30, 2022
@mdpetters mdpetters merged commit da4c7c7 into mdpetters:master Jan 30, 2022
mdpetters added a commit that referenced this pull request Jan 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants