[WIP] Dense covariance using krylov methods #1747

aseyboldt · 2017-02-02T20:23:35Z

I'm not really sure about this pull request, as it does not really seem to help in any of the models I tried, but I still thought it might be interesting enough to put it out there. Also, this might be helpful when someone implements Riemannian MCMC.

Right now, we have four options to specify the global mass matrix in hmc: Using a diagonal matrix (either as covariance or as precision matrix), a dense covariance or precision matrix, or a sparse covariance matrix in csr format. This pull request adds an additional option: defining the covariance matrix through samples X from the posterior. This allows us to use a full covariance matrix in high dimensional setting, and avoid having to store the full covariance matrix. To implement the potential we need matrix-vector products Ax and samples from N(0, A^-1), were A is the covariance matrix. We can compute the first, using Ax = X^tXx. Sampling from the normal distribution is a bit more difficult, I implemented a krylov subspace based algorithm described in [1]. As high dimensional sample covariances are usually problematic, I used a simple stein-type estimator to shrink it towards the diagonal matrix containing the variances.

However, this does not seem to help as much as I hoped it would. Possible reasons I could think of are:

I tried the wrong models, it just helps in other models :-)
The covariance estimates are bad. Maybe something more sophisticated than the stein estimator might help then.
Computing Ax in every transition is quite expensive (2 * s * n multiplications). It parallelizes quite well, however (not on my laptop).
Maybe sampling from the normal distribution does not work as well as I think it does. The Lanczos algorithm is not particularly stable, and I do not use reorthogonalization or restarting to address this.

[1] Chow, E., and Y. Saad. “Preconditioned Krylov Subspace Methods for Sampling Multivariate Gaussian Distributions.” SIAM Journal on Scientific Computing 36, no. 2 (January 1, 2014): A588–608. doi:10.1137/130920587.

twiecki · 2017-02-03T08:22:16Z

pymc3/step_methods/hmc/quadpotential.py

+        eig = 1 / np.sqrt(eig)
+    else:
+        eig = estimator(eig)
+    f_T = eigv @ (eig * np.linalg.solve(eigv, e))


we currently still support python 2.7.

ups, that happens when you develop in the notebook...

twiecki · 2017-02-03T08:26:59Z

This is really interesting and it could fit well with this: #1738. I'm going to test this on the hierarchical glm.

twiecki · 2017-02-03T08:29:47Z

pymc3/step_methods/hmc/base_hmc.py

            scaling = guess_scaling(Point(scaling, model=model), model=model, vars=vars)

-        n = scaling.shape[0]
+        n = model.dict_to_array(model.test_point).shape[0]


This indeed looks like a bug!

Can you submit this separately?

Is there a better way to get at the number of dimensions? This seems a bit like a hack to me...

Agreed, I don't think there is though. Perhaps we should push this into Model as a property ndim.

twiecki · 2017-02-03T08:44:34Z

pymc3/step_methods/hmc/quadpotential.py

    q : Quadpotential
    """
+    if isquadpotential(C):
+        return C


This is also a good feature regardless of the stuff below (for the separate PR).

aseyboldt · 2017-02-03T08:45:59Z

Is the hierarchical model you are talking about the one from the notebook GLM-hierarchical.ipynb?

aseyboldt · 2017-02-03T09:17:02Z

Shouldn't a non-centered parametrization work better for that model:

    a = pm.Normal('a_raw', mu=0, sd=1, shape=n_counties)
    a = pm.Deterministic("a", mu_a + sigma_a * a)

    b = pm.Normal('b_raw', mu=0, sd=1, shape=n_counties)
    b = pm.Deterministic("b", mu_b + sigma_b * b)

twiecki · 2017-02-04T12:15:08Z

@aseyboldt Yep, works way better -- thanks.

What types of models would this sampler work best on? Also, intuitively it's a bit odd to me that precomputing the cov matrix once is slower than this sampling approach.

aseyboldt · 2017-02-04T12:32:17Z

Using the cov matrix takes n^2 multiplications per matrix-vector multiplication. Using this, it takes 2 * s * n, where n is the number of dimensions and s is the number of samples. It is faster if n > 2 * s. In practice I would expect that it takes a bit larger n to be worth it.
The model should also contain posterior correlations, or a diagonal cov matirx will be way faster.

If X is the (n * s) matrix of samples, the normal way using the full covariance matrix computes A = XX.T and then Ax, this patch computes X(X.Tx).

aseyboldt · 2017-04-09T22:27:24Z

Closing this for now. I couldn't find an example where it really helps. If I come across something I might revisit it again.

Add a hmc potential using past samples

b840450

twiecki reviewed Feb 3, 2017

View reviewed changes

aseyboldt mentioned this pull request Feb 3, 2017

Allow user defined potentials for hmc #1749

Merged

aseyboldt closed this Apr 9, 2017

[WIP] Dense covariance using krylov methods #1747

[WIP] Dense covariance using krylov methods #1747

Uh oh!

Conversation

aseyboldt commented Feb 2, 2017

Uh oh!

twiecki Feb 3, 2017

Choose a reason for hiding this comment

Uh oh!

aseyboldt Feb 3, 2017

Choose a reason for hiding this comment

Uh oh!

twiecki commented Feb 3, 2017

Uh oh!

twiecki Feb 3, 2017

Choose a reason for hiding this comment

Uh oh!

twiecki Feb 3, 2017

Choose a reason for hiding this comment

Uh oh!

aseyboldt Feb 3, 2017

Choose a reason for hiding this comment

Uh oh!

aseyboldt Feb 3, 2017

Choose a reason for hiding this comment

Uh oh!

twiecki Feb 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

twiecki Feb 3, 2017

Choose a reason for hiding this comment

Uh oh!

aseyboldt commented Feb 3, 2017

Uh oh!

aseyboldt commented Feb 3, 2017

Uh oh!

twiecki commented Feb 4, 2017

Uh oh!

aseyboldt commented Feb 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aseyboldt commented Apr 9, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

twiecki Feb 3, 2017 •

edited

Loading

aseyboldt commented Feb 4, 2017 •

edited

Loading