Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NUTS step, when selected explicitly, requires the second derivative #2209

Closed
ghost opened this issue May 21, 2017 · 7 comments · Fixed by #2211
Closed

NUTS step, when selected explicitly, requires the second derivative #2209

ghost opened this issue May 21, 2017 · 7 comments · Fixed by #2211

Comments

@ghost
Copy link

ghost commented May 21, 2017

The NUTS sampler, when auto-assigned, requires only the first derivative (i.e. calls grad() method only for the operation itself).

For example, this code for Interpolated distribution works:

import numpy as np
import pymc3 as pm

with pm.Model():
    uniform = pm.Interpolated('uniform', np.linspace(0, 1, 100), np.ones(100))
    pm.sample(1000)

However, when I try to specify NUTS or HamiltonianMC step explicitly, it fails because Interpolated distribution doesn't provide second-order derivatives (the variable returned by grad() doesn't have grad() method):

import numpy as np
import pymc3 as pm

with pm.Model():
    uniform = pm.Interpolated('uniform', np.linspace(0, 1, 100), np.ones(100))
    step = pm.NUTS()
    pm.sample(1000, step=step) # fails

According to the NUTS paper, it should require only first-order derivatives, but maybe I miss something.

So my question is: is it a problem or is it for some reason the expected behavior? In the second case I can add a second order gradient implementation to theInterpolated distribution that for example always returns zeros, but I don't understand why is it needed in the first place.

@ColCarroll
Copy link
Member

It also fails with Metropolis -- looks like there is some scaling that is getting set as soon as you initialize the step function (the line above # fails).

@ghost
Copy link
Author

ghost commented May 21, 2017

@ColCarroll You are right, it actually fails at the initialization of the step function.

But Metropolis works for me:

import numpy as np
import pymc3 as pm

with pm.Model():
    uniform = pm.Interpolated('uniform', np.linspace(0, 1, 100), np.ones(100))
    step = pm.Metropolis()
    pm.sample(1000, step=step)

doesn't produce any errors. I run it with Python 3 from the latest version of Anaconda3 and master branch of PyMC3.

@ghost
Copy link
Author

ghost commented May 21, 2017

So I found that the error can be suppressed if I implement thegrad() method of the gradient operation that throws NotImplementedError.

In this case the NotImplementedError is caught in guess_scaling here, so that it falls back to fixed_hessian and samples.

@aseyboldt
Copy link
Member

This is because nuts needs a mass matrix (the scaling argument). If this is not supplied by the user it estimates this by looking at the diagonal of the hessian at the initial position. You should usually specify it explicitly (and also set is_cov accordingly) if you assign the step method yourself.
Not sure I like this behaviour by the way. I don't think there really is a good default, but maybe the identity matrix would be better. The hessian makes sense for a model with an approximate gaussian posterior if the initial position is the map, but that's about it. It doesn't even have to be positive definite in general.
CC @twiecki @fonnesbeck

@twiecki
Copy link
Member

twiecki commented May 22, 2017

@aseyboldt Yeah, the hessian seems to cause more problems than it's worth so I agree that we should default to an identity that we adapt during tuning using the trace.

@fonnesbeck
Copy link
Member

Yes, that makes sense.

ferrine pushed a commit that referenced this issue May 27, 2017
* Make spacing more compliant to PEP8

* Raise NotImplementedError from SplineWrapper grad operation

Fixes #2209

* Instantiate SplineWrapper recursively to simplify the architecture

* Create spline derivatives lazily

* Move grad_op to a separate property

* Add tests for SplineWrapper

* Fix style issues
@ghost
Copy link
Author

ghost commented May 27, 2017

The referenced above PR #2211 solved the problem by throwing the correct type of exception for not implemented second derivatives and allowing guess_scaling to fall correctly back to fixed_hessian.

However, @aseyboldt pointed out that it is a good idea to replace guess_scaling by just a fixed identity matrix. I opened a PR #2232 for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants