Feature: Hessian-vector product function #126

WardBrian · 2023-05-30T14:48:11Z

Closes #125 and completes part of #84.

This adds a function with the following signature to the C API:

int bs_log_density_hessian_vector_product(const bs_model* m, bool propto,
                                          bool jacobian,
                                          const double* theta_unc,
                                          const double* vector, double* val,
                                          double* Hvp, char** error_msg);

When BRIDGESTAN_AD_HESSIAN is defined, this uses stan::math::hessian_vector_product. Otherwise, I currently have it implemented to compute the hessian and then just multiply it by the supplied vector.

In Stan 2.33 we can use stan-dev/math#2914

bob-carpenter · 2023-05-30T15:52:27Z

I would prefer to default to a system that works rather than one that throws errors. A first implementation can build the entire Hessian with finite diffs of gradients, then multiply. A less memory intensive version would generate one row of the Hessian at a time with finite diffs of gradients, then multiply to get a single entry in the result. This is, of course, parallelizable, as would be the Hessian algorithm in AD for that matter.

aseyboldt · 2023-06-02T20:38:03Z

In the case of using finite diffs, why not approximate $H_pv$ by $\frac{\nabla f(p + hv) - \nabla f(p)}{h}$ (after normalizing $v$)? Is there a problem with that I should be aware of?

edit. Or related finite diff schemes like $\frac{\nabla f(p + hv) - \nabla f(p - hv)}{2h}$...

bob-carpenter · 2023-06-02T21:17:02Z

We're doing the central version (second thing you wrote under "related finite diff"). It's implemented as a functional in Stan's C++ here and we just plugged that into BridgeStan:

https://github.com/stan-dev/math/blob/develop/stan/math/rev/functor/finite_diff_hessian_auto.hpp

aseyboldt · 2023-06-02T21:21:41Z

Sorry if I'm missing something here, but doesn't that compute the complete hessian (in 2n grad calls). I was referring to the finite diff version of a hessian vector product (with 2 grad calls, ignoring questions of selecting h).

bob-carpenter · 2023-06-03T15:13:45Z

Nope---I was the one missing the point. Yes, what I pointed to computes the complete Hessian.

I hadn't though about updating the finite diff algorithm to do Hessian-vector products innately. The place to add that would probably be the Stan math lib. It'd be faster (and more accurate?) than what we're doing now for tests, which I'm pretty sure just computes the Hessian and multiplies.

WardBrian · 2023-06-15T18:49:09Z

Exposed this function in each interface. For the moment I've left it as the naive "calculate the hessian via finite differences and multiply" approach when autodiff hessians are unavailable.

bob-carpenter · 2023-06-15T19:47:48Z

I've left it as the naive "calculate the hessian via finite differences and multiply" approach

I think that's OK for a first implementation.

@aseyboldt : do you know how to write the algorithm to do this more efficiently with finite differences over gradients that doesn't involve computing the whole Hessian?

aseyboldt · 2023-06-15T20:33:07Z

I guess pretty much this one:

from scipy import optimize, linalg

# From scipy.optimize.approx_fprime
_epsilon = np.sqrt(np.finfo(float).eps)

def hessian_vector_prod_approx(grad_func, x, v, *, h=_epsilon):
    v_norm = linalg.norm(v)
    v = v / v_norm
    grad_forward = grad_func(x + h*v)
    grad_backward = grad_func(x - h*v)
    return v_norm * (grad_forward - grad_backward) / (2 * h)

The problem is that I really don't know too much about the numerical issues that I'm pretty sure are buried here, and just googling a bit for literature about this I couldn't find too much that looked useful...

roualdes · 2023-06-16T02:51:27Z

Here's a blog post / reference about that algorithm, which claims the error rate for the approximation is O(h^2) for small h: https://justindomke.wordpress.com/2009/01/17/hessian-vector-products/

Isn't Justin Domke at FI right now? Maybe Bob and/or Brian could check with Justin about this.

I can't find a reference on the error for Stan Math's central difference approximation to the Hessian. I guess it too is O(h^2), but I don't know that.

If both methods have O(h^2) error (assuming constant terms are similar enough), then the faster one would be my preference for the default in BridgeStan when BRIDGESTAN_AD_HESSIAN=false.

P.S. The reference given in the Stan Math source code for the central difference approximation is broken. Following bread crumbs from the intent of the broken link ends at the book Numerical Methods for Unconstrained Optimization and Nonlinear Equations Dennis and Schnabel. There might be an error analysis within this book, but I don't have access, electronic or otherwise.

aseyboldt · 2023-06-16T03:00:12Z

Ah, that blog post might be where I first heard about this trick. I couldn't quite remember where I got it from...

bob-carpenter · 2023-06-16T12:46:23Z

Thanks for the algorithm and reference. @justindomke was visiting, but we can still check with him. @bgoodri may also know---I think the central finite diffs rather than one-sided versions was his idea.

The place to code the faster version is in the stan-dev/math repo. It should also be easier to test inside of the math repo. Then we can just include it in BridgeStan. We can also fix the link.

justindomke · 2023-06-16T13:15:58Z

Hi everyone! I have a few things to add that might be helpful.

One is that you can think of a Hessian-vector product as two composed gradient evaluations. Say you want to compute $\nabla^2 f(x) v$, where $x$ and $v$ are vectors and $f: R^n \rightarrow R$. Define $g(x) = \nabla f(x)$ to be the gradient and define $h(x) = g(x) \cdot v$. Then $\nabla^2 f(x) v = \nabla h(x)$.

I don't know if Stan can do 2-nd order derivatives like this, but here's a little demo using JAX:

import jax
from jax import numpy as jnp

def f(x):
    "random nonlinear function"
    return (x[0] * jnp.sum(jnp.sin(x)))**x[1]

def naive_hvp(x,v):
    return jax.hessian(f)(x) @ v

def recursive_hvp(x,v):
    g = jax.grad(f)
    h = lambda x: g(x) @ v
    return jax.grad(h)(x)

x = jnp.array([1.1,2.2,3.3])
v = jnp.array([4.4,5.5,6.6])

produces the output

naive_hvp(x,v)=Array([ -6.506892, -44.439754, -32.04876 ], dtype=float32)
recursive_hvp(x,v)=Array([ -6.5068817, -44.43976  , -32.048767 ], dtype=float32)

If you can do it this way, you probably should. recursive_hvp has the same complexity as f (up to some constant independent of the length of x) and has no finite-different floating point errors.

Second, there's essentially an infinite number of finite difference schemes, with different tradeoffs between computational cost and accuracy. You can go beyond 1-sided or 2-sided differences to "4-sided" or higher. Here's short post from me giving the basic idea. This is for derivatives rather than Hessian-vector products, but the same idea applies. You could modify @aseyboldt 's code above to use any of these.

In practice many people seem to find two-sided differences to be a reasonable compromise. They at least work "most of the time" whereas with one-sided differences will fall over for even slightly challenging problems.

Finally, if you're doing finite differences, don't neglect the problem of choosing the perturbation size. As far as I know, this is hard and there is no general scheme that's always guaranteed to work well. But probably there are heuristics that are better than always using numerical epsilon. I've had good luck using the heuristic $r = \sqrt{\epsilon}(1 + \vert x \vert_\infty) / \vert \nabla f(x) \vert_\infty$ [edit: added missing $/$] (where $r$ is the perturbation size and $\epsilon$ is numerical epsilon) which I first found in this paper.

bob-carpenter · 2023-06-16T13:56:04Z

Thanks for the explanation and refs, @justindomke. Stan has a reasonable Hessian-vector product implemented for everything but our implicit functions. For models using implicit functions, we want to provide finite differences. I opened an issue in the math lib, which is where the implementation should go:

stan-dev/math#2913

When we were discussing finite diffs, someone made exactly the same point about 2 diffs being a good compromise.

WardBrian · 2023-06-16T14:42:19Z

I took a pass at the above in stan-dev/math#2914, more or less a direct translation of what @aseyboldt had above.

For BridgeStan, we have a couple choices in the meantime:

Hold off on the next release of Stan, and use the stan::math implementation assuming one gets merged
Include a copy of it ourselves for now
Use the naive approach of calculating the Hessian and then multiplying when second order autodiff is not available.

bob-carpenter · 2023-06-16T15:42:28Z

I'm OK with any of those choices---they all have pros and cons. I'm leaning slightly toward (2) or (3) just to get the API in place even if we change implementations later.

WardBrian · 2023-06-16T16:49:49Z

I've had good luck using the heuristic $r = \sqrt{\epsilon}(1 + \vert x \vert_\infty) \vert \nabla f(x) \vert_\infty$ (where $r$ is the perturbation size and $\epsilon$ is numerical epsilon) which I first found in this paper.

@justindomke It seems like that paper suggests a different heuristic (namely $r = \frac{\sqrt{\epsilon} ( 1 + \Vert x\Vert)}{\Vert v \Vert} $) in eq 5.2, am I missing the other one somewhere?

roualdes · 2023-06-16T19:08:09Z

I too am OK with any of those choice, lean towards 2. or 3., and I vote 3. just because that's already written up in this PR.

R/R/bridgestan.R

justindomke · 2023-06-16T19:38:37Z

@justindomke It seems like that paper suggests a different heuristic (namely r=ϵ(1+‖x‖)‖v‖) in eq 5.2, am I missing the other one somewhere?

No, I'm sure you're right! I missed a division symbol above (fixed now), and also mine was for gradients. Yours looks good.

WardBrian · 2023-09-11T13:26:20Z

With the release of Stan 2.33 this has been updated to use the finite diff Hessian-vector-product from stan::math if autodiff hessians are unavailable. I believe it's ready for review @roualdes @bob-carpenter

roualdes

A few comments/questions. Mostly nitpicks, I think.

Otherwise, it looks good to me. Thanks for orchestrating yet another multiple repository feature-add.

julia/test/model_tests.jl

roualdes · 2023-09-12T20:45:22Z

src/bridgestan.cpp

+    if (error_msg) {
+      std::stringstream error;
+      error << "log_density_hessian_vector_product() failed with unknown "
+               "exception"


Should there be another chevron here, <<?

It's actually optional, C++ lets you place two string literals next to each other and the result is they are just pasted together (see "Concatenation" on https://en.cppreference.com/w/cpp/language/string_literal).

If you prefer the extra << stylistically I'm happy to add it.

Looks good. I just didn't know this. Thanks.

src/bridgestan.h

src/model_rng.cpp

src/model_rng.hpp

roualdes

Thanks for the fixes.

roualdes · 2023-09-13T23:52:10Z

src/bridgestan.cpp

+    if (error_msg) {
+      std::stringstream error;
+      error << "log_density_hessian_vector_product() failed with unknown "
+               "exception"


Looks good. I just didn't know this. Thanks.

C++ implementation

13685c4

WardBrian added enhancement New feature or request c-api cpp labels May 30, 2023

Python bindings and test

49b08d5

WardBrian added 4 commits June 15, 2023 10:54

R interface work

2d81276

Julia interface

226c6b9

Formatting

2e5e8c6

Rust interface

c0a2b51

WardBrian marked this pull request as ready for review June 15, 2023 18:48

Update docs

70131d9

bob-carpenter mentioned this pull request Jun 16, 2023

Hessian-vector product with central finite differences stan-dev/math#2913

Closed

WardBrian mentioned this pull request Jun 16, 2023

Add rev functor to compute hessian-times-vector by finite differences stan-dev/math#2914

Merged

5 tasks

roualdes reviewed Jun 16, 2023

View reviewed changes

R/R/bridgestan.R Show resolved Hide resolved

R: set NAOK

5875b3a

WardBrian added 5 commits June 27, 2023 09:53

Merge branch 'main' into feature/hessian-vector-products

9ced5f2

Merge branch 'main' into feature/hessian-vector-products

d697db0

Merge branch 'main' into feature/hessian-vector-products

2cc2c9f

Fix typo

1306d03

Merge branch 'main' into feature/hessian-vector-products

dd65cde

WardBrian mentioned this pull request Jul 6, 2023

Tracking: Changes in the next version of Stan #154

Closed

WardBrian added 7 commits July 10, 2023 12:34

Merge branch 'main' into feature/hessian-vector-products

c4a0d92

Rename pointer for consistency

50caabb

Merge branch 'main' into feature/hessian-vector-products

3cc80da

Update doc

43deca6

Merge branch 'main' into feature/hessian-vector-products

88130d6

Switch finite diff method to finite_diff_hessian_times_vector_auto()

354e391

Rust: only test to fp tolerance

5961f96

roualdes approved these changes Sep 12, 2023

View reviewed changes

Update documentation

eaca9f6

WardBrian requested a review from roualdes September 13, 2023 15:33

Fix typo

faf658b

roualdes approved these changes Sep 13, 2023

View reviewed changes

WardBrian merged commit 374d1fe into main Sep 14, 2023

WardBrian deleted the feature/hessian-vector-products branch September 14, 2023 02:05

aseyboldt mentioned this pull request Jun 26, 2024

Add helper to build hessian vector product pymc-devs/pytensor#858

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Hessian-vector product function #126

Feature: Hessian-vector product function #126

WardBrian commented May 30, 2023 •

edited

Loading

bob-carpenter commented May 30, 2023

aseyboldt commented Jun 2, 2023 •

edited

Loading

bob-carpenter commented Jun 2, 2023

aseyboldt commented Jun 2, 2023 •

edited

Loading

bob-carpenter commented Jun 3, 2023

WardBrian commented Jun 15, 2023

bob-carpenter commented Jun 15, 2023

aseyboldt commented Jun 15, 2023

roualdes commented Jun 16, 2023

aseyboldt commented Jun 16, 2023

bob-carpenter commented Jun 16, 2023

justindomke commented Jun 16, 2023 •

edited

Loading

bob-carpenter commented Jun 16, 2023

WardBrian commented Jun 16, 2023

bob-carpenter commented Jun 16, 2023

WardBrian commented Jun 16, 2023

roualdes commented Jun 16, 2023

justindomke commented Jun 16, 2023

WardBrian commented Sep 11, 2023

roualdes left a comment

roualdes Sep 12, 2023

WardBrian Sep 13, 2023

roualdes Sep 13, 2023

roualdes left a comment

roualdes Sep 13, 2023

Feature: Hessian-vector product function #126

Feature: Hessian-vector product function #126

Conversation

WardBrian commented May 30, 2023 • edited Loading

bob-carpenter commented May 30, 2023

aseyboldt commented Jun 2, 2023 • edited Loading

bob-carpenter commented Jun 2, 2023

aseyboldt commented Jun 2, 2023 • edited Loading

bob-carpenter commented Jun 3, 2023

WardBrian commented Jun 15, 2023

bob-carpenter commented Jun 15, 2023

aseyboldt commented Jun 15, 2023

roualdes commented Jun 16, 2023

aseyboldt commented Jun 16, 2023

bob-carpenter commented Jun 16, 2023

justindomke commented Jun 16, 2023 • edited Loading

bob-carpenter commented Jun 16, 2023

WardBrian commented Jun 16, 2023

bob-carpenter commented Jun 16, 2023

WardBrian commented Jun 16, 2023

roualdes commented Jun 16, 2023

justindomke commented Jun 16, 2023

WardBrian commented Sep 11, 2023

roualdes left a comment

Choose a reason for hiding this comment

roualdes Sep 12, 2023

Choose a reason for hiding this comment

WardBrian Sep 13, 2023

Choose a reason for hiding this comment

roualdes Sep 13, 2023

Choose a reason for hiding this comment

roualdes left a comment

Choose a reason for hiding this comment

roualdes Sep 13, 2023

Choose a reason for hiding this comment

WardBrian commented May 30, 2023 •

edited

Loading

aseyboldt commented Jun 2, 2023 •

edited

Loading

aseyboldt commented Jun 2, 2023 •

edited

Loading

justindomke commented Jun 16, 2023 •

edited

Loading