NUTS step, when selected explicitly, requires the second derivative #2209

ghost · 2017-05-21T07:22:26Z

The NUTS sampler, when auto-assigned, requires only the first derivative (i.e. calls grad() method only for the operation itself).

For example, this code for Interpolated distribution works:

import numpy as np
import pymc3 as pm

with pm.Model():
    uniform = pm.Interpolated('uniform', np.linspace(0, 1, 100), np.ones(100))
    pm.sample(1000)

However, when I try to specify NUTS or HamiltonianMC step explicitly, it fails because Interpolated distribution doesn't provide second-order derivatives (the variable returned by grad() doesn't have grad() method):

import numpy as np
import pymc3 as pm

with pm.Model():
    uniform = pm.Interpolated('uniform', np.linspace(0, 1, 100), np.ones(100))
    step = pm.NUTS()
    pm.sample(1000, step=step) # fails

According to the NUTS paper, it should require only first-order derivatives, but maybe I miss something.

So my question is: is it a problem or is it for some reason the expected behavior? In the second case I can add a second order gradient implementation to theInterpolated distribution that for example always returns zeros, but I don't understand why is it needed in the first place.

The text was updated successfully, but these errors were encountered:

ColCarroll · 2017-05-21T12:01:41Z

It also fails with Metropolis -- looks like there is some scaling that is getting set as soon as you initialize the step function (the line above # fails).

ghost · 2017-05-21T12:52:44Z

@ColCarroll You are right, it actually fails at the initialization of the step function.

But Metropolis works for me:

import numpy as np
import pymc3 as pm

with pm.Model():
    uniform = pm.Interpolated('uniform', np.linspace(0, 1, 100), np.ones(100))
    step = pm.Metropolis()
    pm.sample(1000, step=step)

doesn't produce any errors. I run it with Python 3 from the latest version of Anaconda3 and master branch of PyMC3.

ghost · 2017-05-21T13:29:25Z

So I found that the error can be suppressed if I implement thegrad() method of the gradient operation that throws NotImplementedError.

In this case the NotImplementedError is caught in guess_scaling here, so that it falls back to fixed_hessian and samples.

aseyboldt · 2017-05-21T16:35:33Z

This is because nuts needs a mass matrix (the scaling argument). If this is not supplied by the user it estimates this by looking at the diagonal of the hessian at the initial position. You should usually specify it explicitly (and also set is_cov accordingly) if you assign the step method yourself.
Not sure I like this behaviour by the way. I don't think there really is a good default, but maybe the identity matrix would be better. The hessian makes sense for a model with an approximate gaussian posterior if the initial position is the map, but that's about it. It doesn't even have to be positive definite in general.
CC @twiecki @fonnesbeck

twiecki · 2017-05-22T09:24:32Z

@aseyboldt Yeah, the hessian seems to cause more problems than it's worth so I agree that we should default to an identity that we adapt during tuning using the trace.

fonnesbeck · 2017-05-22T14:35:00Z

Yes, that makes sense.

* Make spacing more compliant to PEP8 * Raise NotImplementedError from SplineWrapper grad operation Fixes #2209 * Instantiate SplineWrapper recursively to simplify the architecture * Create spline derivatives lazily * Move grad_op to a separate property * Add tests for SplineWrapper * Fix style issues

ghost · 2017-05-27T20:45:28Z

The referenced above PR #2211 solved the problem by throwing the correct type of exception for not implemented second derivatives and allowing guess_scaling to fall correctly back to fixed_hessian.

However, @aseyboldt pointed out that it is a good idea to replace guess_scaling by just a fixed identity matrix. I opened a PR #2232 for it.

ghost mentioned this issue May 21, 2017

Raise NotImplementedError for SplineWrapper gradient operation #2211

Merged

ferrine closed this as completed in #2211 May 27, 2017

ghost mentioned this issue May 27, 2017

WIP: replace Hessian scaling guessing by an identity matrix #2232

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NUTS step, when selected explicitly, requires the second derivative #2209

NUTS step, when selected explicitly, requires the second derivative #2209

ghost commented May 21, 2017 •

edited by ghost

Loading

ColCarroll commented May 21, 2017

ghost commented May 21, 2017 •

edited by ghost

Loading

ghost commented May 21, 2017 •

edited by ghost

Loading

aseyboldt commented May 21, 2017

twiecki commented May 22, 2017 •

edited

Loading

fonnesbeck commented May 22, 2017

ghost commented May 27, 2017 •

edited by ghost

Loading

NUTS step, when selected explicitly, requires the second derivative #2209

NUTS step, when selected explicitly, requires the second derivative #2209

Comments

ghost commented May 21, 2017 • edited by ghost Loading

ColCarroll commented May 21, 2017

ghost commented May 21, 2017 • edited by ghost Loading

ghost commented May 21, 2017 • edited by ghost Loading

aseyboldt commented May 21, 2017

twiecki commented May 22, 2017 • edited Loading

fonnesbeck commented May 22, 2017

ghost commented May 27, 2017 • edited by ghost Loading

ghost commented May 21, 2017 •

edited by ghost

Loading

ghost commented May 21, 2017 •

edited by ghost

Loading

ghost commented May 21, 2017 •

edited by ghost

Loading

twiecki commented May 22, 2017 •

edited

Loading

ghost commented May 27, 2017 •

edited by ghost

Loading