You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently I'm trying to solve an optimization problem wherein I have a Hessian I know to be sparse. Thus, I have coded my analytical Hessian to return a sparse matrix for use in Newton's method. I have determined that in fact, despite returning a sparse Hessian, it will end up being represented as a dense matrix. I've determined this as the optimization routine ends up spending a lot of time performing a cholesky decomposition in update_state!:
function update_state!(d, state::NewtonState, method::Newton)
# Search direction is always the negative gradient divided by
# a matrix encoding the absolute values of the curvatures
# represented by H. It deviates from the usual "add a scaled
# identity matrix" version of the modified Newton method. More
# information can be found in the discussion at issue #153.
T = eltype(state.x)
if typeof(NLSolversBase.hessian(d)) <: AbstractSparseMatrix
state.s .= .-(NLSolversBase.hessian(d)\convert(Vector{T}, gradient(d)))
else
state.F = cholesky!(Positive, NLSolversBase.hessian(d))
if typeof(gradient(d)) <: Array
# is this actually StridedArray?
ldiv!(state.s, state.F, -gradient(d))
else
# not Array, we can't do inplace ldiv
gv = Vector{T}(undef, length(gradient(d)))
copyto!(gv, -gradient(d))
copyto!(state.s, state.F\gv)
end
end
# Determine the distance of movement along the search line
lssuccess = perform_linesearch!(state, method, d)
# Update current position # x = x + alpha * s
@. state.x = state.x + state.alpha * state.s
lssuccess == false # break on linesearch error
end
An example of a MRE is as follows:
using Optim, SparseArrays, LinearAlgebra
# Example problem: minimize f(x) = (x₁ - 1)² + 100(x₂ - x₁²)²
function rosenbrock_f(x)
return (x[1] - 1)^2 + 100 * (x[2] - x[1]^2)^2
end
function rosenbrock_g!(g, x)
g[1] = 2 * (x[1] - 1) - 400 * x[1] * (x[2] - x[1]^2)
g[2] = 200 * (x[2] - x[1]^2)
return g
end
function rosenbrock_h!(h, x)
# Return sparse Hessian directly
return h .= sparse([1, 1, 2, 2], # rows
[1, 2, 1, 2], # cols
[2 - 400 * x[2] + 1200 * x[1]^2, # values
-400 * x[1],
-400 * x[1],
200],
2, 2) # matrix dimensions
end
# Initial point
x0 = [-1.0, 1.0]
# Set up the optimization problem
od = TwiceDifferentiable(rosenbrock_f, rosenbrock_g!, rosenbrock_h!, x0)
# Run the optimization
result = optimize(od, x0, Newton())
This is obviously a silly example as the Hessian is not sparse in this example, but it still will produce the results I'm seeing i.e., the Hessian will be represented as dense, and so as opposed to a direct linear solve will perform a cholesky decomposition.
The text was updated successfully, but these errors were encountered:
Yes reading that test is very helpful, I ended up reading through the NLSolversBase code and came to a similar conclusion. I definitely agree having this written somewhere would be good. I'd be happy to add this to the documentation.
The only other interesting aspect of this problem I was having is that, when I finally figured out how to properly format the problem, the convergence was acting in a bizarre way, where the optimization would return as successful, but the actual solution would be identical to the initial solution. The problem i was working with was strongly convex, so this was pretty strange. I got around it by setting a strict convergence criteria using Optim.Options() which fixed it. For some additional context you might enjoy this discourse thread..
Currently I'm trying to solve an optimization problem wherein I have a Hessian I know to be sparse. Thus, I have coded my analytical Hessian to return a sparse matrix for use in Newton's method. I have determined that in fact, despite returning a sparse Hessian, it will end up being represented as a dense matrix. I've determined this as the optimization routine ends up spending a lot of time performing a cholesky decomposition in
update_state!
:An example of a MRE is as follows:
This is obviously a silly example as the Hessian is not sparse in this example, but it still will produce the results I'm seeing i.e., the Hessian will be represented as dense, and so as opposed to a direct linear solve will perform a cholesky decomposition.
The text was updated successfully, but these errors were encountered: