-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor following theory #57
Comments
One key insight from the ongoing refactor in #59 is that we previously used dicts of sets to represent both Introducing a struct GlobalHessianTracer{G,H} <: AbstractHessianTracer
grad::G # sparse binary vector representation of non-zero entries in the gradient
hessian::H # sparse binary matrix representation of non-zero entries in the Hessian
end therefore isn't flexible enough to support our previous approach. |
I'm still not convinced a single dict can fully represent the gradient tracer as well. Aren't there cases where the gradient tracer has more nonzero coordinates than the hessian tracer? |
Typically a linear function |
Right, those were represented as empty sets. To give you an example: Dict(
1 => (), # <-- empty sets hold only gradient information
3 => (4, 5),
6 => (6),
) represents the first-order information (all other
and the second-order information (except for permutations of the cases below, all other
|
Right, I had forgotten that part. My take is that we should do the implementation that sticks to the theory, and if we really lose performance we can think about reintroducing this trick. Presumably the hessian tracing is much more expensive than the gradient tracing anyway, so it doesn't add much cost to carry a gradient tracer around (plus it is necessary if you use the set of pairs representation, which I'm still convinced is the right one) |
This is the option I also tend towards. An alternative would be to introduce wrapper structs for "first- and second-order information":
|
We don't have time for fancy schmancy stuff. Let's do the clean way that we are sure works, and after NeurIPS we can bikeshed such details. |
Plus now that we have CI benchmarks we'll know if there's a really bad regression! |
Dumping the code from our meeting
The text was updated successfully, but these errors were encountered: