Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User supplied sparse Hessian is converted to dense matrix. #1106

Open
rsenne opened this issue Oct 27, 2024 · 2 comments
Open

User supplied sparse Hessian is converted to dense matrix. #1106

rsenne opened this issue Oct 27, 2024 · 2 comments

Comments

@rsenne
Copy link

rsenne commented Oct 27, 2024

Currently I'm trying to solve an optimization problem wherein I have a Hessian I know to be sparse. Thus, I have coded my analytical Hessian to return a sparse matrix for use in Newton's method. I have determined that in fact, despite returning a sparse Hessian, it will end up being represented as a dense matrix. I've determined this as the optimization routine ends up spending a lot of time performing a cholesky decomposition in update_state!:

function update_state!(d, state::NewtonState, method::Newton)
    # Search direction is always the negative gradient divided by
    # a matrix encoding the absolute values of the curvatures
    # represented by H. It deviates from the usual "add a scaled
    # identity matrix" version of the modified Newton method. More
    # information can be found in the discussion at issue #153.
    T = eltype(state.x)

    if typeof(NLSolversBase.hessian(d)) <: AbstractSparseMatrix
        state.s .= .-(NLSolversBase.hessian(d)\convert(Vector{T}, gradient(d)))
    else
        state.F = cholesky!(Positive, NLSolversBase.hessian(d))
        if typeof(gradient(d)) <: Array
            # is this actually StridedArray?
            ldiv!(state.s, state.F, -gradient(d))
        else
            # not Array, we can't do inplace ldiv
            gv = Vector{T}(undef, length(gradient(d)))
            copyto!(gv, -gradient(d))
            copyto!(state.s, state.F\gv)
        end
    end
    # Determine the distance of movement along the search line
    lssuccess = perform_linesearch!(state, method, d)

    # Update current position # x = x + alpha * s
    @. state.x = state.x + state.alpha * state.s
    lssuccess == false # break on linesearch error
end

An example of a MRE is as follows:

using Optim, SparseArrays, LinearAlgebra

# Example problem: minimize f(x) = (x₁ - 1)² + 100(x₂ - x₁²)²
function rosenbrock_f(x)
    return (x[1] - 1)^2 + 100 * (x[2] - x[1]^2)^2
end

function rosenbrock_g!(g, x)
    g[1] = 2 * (x[1] - 1) - 400 * x[1] * (x[2] - x[1]^2)
    g[2] = 200 * (x[2] - x[1]^2)
    return g
end

function rosenbrock_h!(h, x)
    # Return sparse Hessian directly
    return h .= sparse([1, 1, 2, 2],  # rows
                 [1, 2, 1, 2],   # cols
                 [2 - 400 * x[2] + 1200 * x[1]^2,  # values
                  -400 * x[1],
                  -400 * x[1],
                  200], 
                 2, 2)  # matrix dimensions
end

# Initial point
x0 = [-1.0, 1.0]

# Set up the optimization problem
od = TwiceDifferentiable(rosenbrock_f, rosenbrock_g!, rosenbrock_h!, x0)

# Run the optimization
result = optimize(od, x0, Newton())

This is obviously a silly example as the Hessian is not sparse in this example, but it still will produce the results I'm seeing i.e., the Hessian will be represented as dense, and so as opposed to a direct linear solve will perform a cholesky decomposition.

@pkofod
Copy link
Member

pkofod commented Oct 29, 2024

Thank you. This would be because

od = TwiceDifferentiable(rosenbrock_f, rosenbrock_g!, rosenbrock_h!, x0)

does not accurately reflect your hessian format. does it help if you read the test here? https://github.com/JuliaNLSolvers/NLSolversBase.jl/blob/master/test/sparse.jl We should write this somewhere.

@rsenne
Copy link
Author

rsenne commented Oct 29, 2024

Yes reading that test is very helpful, I ended up reading through the NLSolversBase code and came to a similar conclusion. I definitely agree having this written somewhere would be good. I'd be happy to add this to the documentation.

The only other interesting aspect of this problem I was having is that, when I finally figured out how to properly format the problem, the convergence was acting in a bizarre way, where the optimization would return as successful, but the actual solution would be identical to the initial solution. The problem i was working with was strongly convex, so this was pretty strange. I got around it by setting a strict convergence criteria using Optim.Options() which fixed it. For some additional context you might enjoy this discourse thread..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants