Refactor following theory #57

gdalle · 2024-05-09T10:24:44Z

Refactor global tracers
Add local tracers #56

Dumping the code from our meeting

struct GradientTracer end
struct LocalGradientTracer end

⊕(::AbstractSparseVector, ::AbstractSparseVector)

firstder_firstarg_nonzero(::typeof(max)) = true
firstder_firstarg_nonzero(::typeof(max), α, β) = β > α

for op in all_the_ops
    @eval begin
        function $op(t1::T, t2::T) where {T<:GlobalGradientTracer}
            gα = gradient_sparsity(t1)
            gβ = gradient_sparsity(t2)
            a = firstder_firstarg_nonzero(♣)
            b = firstder_secondarg_nonzero(♣)
            if a && b
                return gα ⊕ gβ
            elseif a
                return gα
            elseif b
                return bβ
            else
                return nothing # empty
            end
        end

        function $op(t1::T, t2::T) where {T<:GlobalHessianTracer}
            gα = gradient_sparsity(t1)
            gβ = gradient_sparsity(t2)
            Hα = hessian_sparsity(t1)
            Hβ = hessian_sparsity(t2)
            a = firstder_firstarg_nonzero(op)
            b = firstder_secondarg_nonzero(op)
            c = seconder_firstard_nonzero(op)
            d = nothing #
            e = nothing #
            # more stuff
        end

        function $op(t1::T, t2::T) where {T<:LocalGradientTracer}
            α = primal(t1)
            β = primal(t2)
            gα = gradient_sparsity(t1)
            gβ = gradient_sparsity(t2)
            a = firstder_firstarg_nonzero(♣, α, β)
            b = firstder_secondarg_nonzero(♣, α, β)
            if a && b
                return gα ⊕ gβ
            elseif a
                return gα
            elseif b
                return bβ
            else
                return nothing # empty
            end
        end
    end
end

adrhill · 2024-05-10T18:20:16Z

One key insight from the ongoing refactor in #59 is that we previously used dicts of sets to represent both hessian_sparsity (key => non-empty set) and gradient_sparsity (key => empty set).

Introducing a GlobalHessianTracer with two parametric types

struct GlobalHessianTracer{G,H} <: AbstractHessianTracer
    grad::G     # sparse binary vector representation of non-zero entries in the gradient
    hessian::H  # sparse binary matrix representation of non-zero entries in the Hessian
end

therefore isn't flexible enough to support our previous approach.

gdalle · 2024-05-10T18:31:53Z

I'm still not convinced a single dict can fully represent the gradient tracer as well. Aren't there cases where the gradient tracer has more nonzero coordinates than the hessian tracer?

gdalle · 2024-05-10T18:49:38Z

Typically a linear function

adrhill · 2024-05-10T19:07:34Z

Aren't there cases where the gradient tracer has more nonzero coordinates than the hessian tracer?

Right, those were represented as empty sets. To give you an example:

Dict(
  1 => (),     # <-- empty sets hold only gradient information
  3 => (4, 5),
  6 => (6),
)

represents the first-order information (all other $\frac{\partial f}{\partial x_i}=0$)

$\frac{\partial f}{\partial x_1} \neq 0$
$\frac{\partial f}{\partial x_3} \neq 0$
$\frac{\partial f}{\partial x_6} \neq 0$

and the second-order information (except for permutations of the cases below, all other $\frac{\partial^2 f}{\partial x_i \partial x_j} = 0$)

$\frac{\partial^2 f}{\partial x_3 \partial x_4} \neq 0$
$\frac{\partial^2 f}{\partial x_3 \partial x_5} \neq 0$
$\frac{\partial^2 f}{\partial x_6^2} \neq 0$

gdalle · 2024-05-10T19:10:29Z

Right, I had forgotten that part. My take is that we should do the implementation that sticks to the theory, and if we really lose performance we can think about reintroducing this trick. Presumably the hessian tracing is much more expensive than the gradient tracing anyway, so it doesn't add much cost to carry a gradient tracer around (plus it is necessary if you use the set of pairs representation, which I'm still convinced is the right one)

adrhill · 2024-05-10T19:14:34Z

This is the option I also tend towards.

An alternative would be to introduce wrapper structs for "first- and second-order information":

one for generic, separate first- and second- order information
one for this dict-based approach

gdalle · 2024-05-10T19:16:56Z

We don't have time for fancy schmancy stuff. Let's do the clean way that we are sure works, and after NeurIPS we can bikeshed such details.

gdalle · 2024-05-10T19:17:16Z

Plus now that we have CI benchmarks we'll know if there's a really bad regression!

adrhill · 2024-05-14T16:39:36Z

Task 1 was completed in #59. Closing in favor of the more specific #56.

gdalle mentioned this issue May 9, 2024

Add local tracers #56

Closed

This was referenced May 9, 2024

Refactor tracer types #53

Merged

Add CI benchmarking #54

Closed

Refactor code to match theory #59

Merged

adrhill closed this as completed May 14, 2024

gdalle mentioned this issue May 15, 2024

Return both Hessian and Jacobian patterns #30

Closed

This was referenced May 31, 2024

Generalize first- and second-order information in HessianTracer #114

Closed

Generalize sparsity pattern representations #119

Merged

adrhill mentioned this issue Jun 26, 2024

Introduce sparsity patterns #139

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor following theory #57

Refactor following theory #57

gdalle commented May 9, 2024 •

edited by adrhill

Loading

adrhill commented May 10, 2024

gdalle commented May 10, 2024

gdalle commented May 10, 2024

adrhill commented May 10, 2024

gdalle commented May 10, 2024

adrhill commented May 10, 2024 •

edited

Loading

gdalle commented May 10, 2024

gdalle commented May 10, 2024

adrhill commented May 14, 2024

Refactor following theory #57

Refactor following theory #57

Comments

gdalle commented May 9, 2024 • edited by adrhill Loading

adrhill commented May 10, 2024

gdalle commented May 10, 2024

gdalle commented May 10, 2024

adrhill commented May 10, 2024

gdalle commented May 10, 2024

adrhill commented May 10, 2024 • edited Loading

gdalle commented May 10, 2024

gdalle commented May 10, 2024

adrhill commented May 14, 2024

gdalle commented May 9, 2024 •

edited by adrhill

Loading

adrhill commented May 10, 2024 •

edited

Loading