Skip to content

Conversation

@aseyboldt
Copy link
Member

I'm not really sure about this pull request, as it does not really seem to help in any of the models I tried, but I still thought it might be interesting enough to put it out there. Also, this might be helpful when someone implements Riemannian MCMC.

Right now, we have four options to specify the global mass matrix in hmc: Using a diagonal matrix (either as covariance or as precision matrix), a dense covariance or precision matrix, or a sparse covariance matrix in csr format. This pull request adds an additional option: defining the covariance matrix through samples X from the posterior. This allows us to use a full covariance matrix in high dimensional setting, and avoid having to store the full covariance matrix. To implement the potential we need matrix-vector products Ax and samples from N(0, A^-1), were A is the covariance matrix. We can compute the first, using Ax = X^tXx. Sampling from the normal distribution is a bit more difficult, I implemented a krylov subspace based algorithm described in [1]. As high dimensional sample covariances are usually problematic, I used a simple stein-type estimator to shrink it towards the diagonal matrix containing the variances.

However, this does not seem to help as much as I hoped it would. Possible reasons I could think of are:

  • I tried the wrong models, it just helps in other models :-)
  • The covariance estimates are bad. Maybe something more sophisticated than the stein estimator might help then.
  • Computing Ax in every transition is quite expensive (2 * s * n multiplications). It parallelizes quite well, however (not on my laptop).
  • Maybe sampling from the normal distribution does not work as well as I think it does. The Lanczos algorithm is not particularly stable, and I do not use reorthogonalization or restarting to address this.

[1] Chow, E., and Y. Saad. “Preconditioned Krylov Subspace Methods for Sampling Multivariate Gaussian Distributions.” SIAM Journal on Scientific Computing 36, no. 2 (January 1, 2014): A588–608. doi:10.1137/130920587.

eig = 1 / np.sqrt(eig)
else:
eig = estimator(eig)
f_T = eigv @ (eig * np.linalg.solve(eigv, e))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we currently still support python 2.7.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ups, that happens when you develop in the notebook...

@twiecki
Copy link
Member

twiecki commented Feb 3, 2017

This is really interesting and it could fit well with this: #1738. I'm going to test this on the hierarchical glm.

scaling = guess_scaling(Point(scaling, model=model), model=model, vars=vars)

n = scaling.shape[0]
n = model.dict_to_array(model.test_point).shape[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indeed looks like a bug!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you submit this separately?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way to get at the number of dimensions? This seems a bit like a hack to me...

Copy link
Member

@twiecki twiecki Feb 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I don't think there is though. Perhaps we should push this into Model as a property ndim.

q : Quadpotential
"""
if isquadpotential(C):
return C
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a good feature regardless of the stuff below (for the separate PR).

@aseyboldt
Copy link
Member Author

Is the hierarchical model you are talking about the one from the notebook GLM-hierarchical.ipynb?

@aseyboldt
Copy link
Member Author

Shouldn't a non-centered parametrization work better for that model:

    a = pm.Normal('a_raw', mu=0, sd=1, shape=n_counties)
    a = pm.Deterministic("a", mu_a + sigma_a * a)

    b = pm.Normal('b_raw', mu=0, sd=1, shape=n_counties)
    b = pm.Deterministic("b", mu_b + sigma_b * b)

@twiecki
Copy link
Member

twiecki commented Feb 4, 2017

@aseyboldt Yep, works way better -- thanks.

What types of models would this sampler work best on? Also, intuitively it's a bit odd to me that precomputing the cov matrix once is slower than this sampling approach.

@aseyboldt
Copy link
Member Author

aseyboldt commented Feb 4, 2017

Using the cov matrix takes n^2 multiplications per matrix-vector multiplication. Using this, it takes 2 * s * n, where n is the number of dimensions and s is the number of samples. It is faster if n > 2 * s. In practice I would expect that it takes a bit larger n to be worth it.
The model should also contain posterior correlations, or a diagonal cov matirx will be way faster.

If X is the (n * s) matrix of samples, the normal way using the full covariance matrix computes A = XX.T and then Ax, this patch computes X(X.Tx).

@aseyboldt
Copy link
Member Author

Closing this for now. I couldn't find an example where it really helps. If I come across something I might revisit it again.

@aseyboldt aseyboldt closed this Apr 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants