-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[WIP] Dense covariance using krylov methods #1747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| eig = 1 / np.sqrt(eig) | ||
| else: | ||
| eig = estimator(eig) | ||
| f_T = eigv @ (eig * np.linalg.solve(eigv, e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we currently still support python 2.7.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ups, that happens when you develop in the notebook...
|
This is really interesting and it could fit well with this: #1738. I'm going to test this on the hierarchical glm. |
| scaling = guess_scaling(Point(scaling, model=model), model=model, vars=vars) | ||
|
|
||
| n = scaling.shape[0] | ||
| n = model.dict_to_array(model.test_point).shape[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This indeed looks like a bug!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you submit this separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better way to get at the number of dimensions? This seems a bit like a hack to me...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I don't think there is though. Perhaps we should push this into Model as a property ndim.
| q : Quadpotential | ||
| """ | ||
| if isquadpotential(C): | ||
| return C |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also a good feature regardless of the stuff below (for the separate PR).
|
Is the hierarchical model you are talking about the one from the notebook GLM-hierarchical.ipynb? |
|
Shouldn't a non-centered parametrization work better for that model: a = pm.Normal('a_raw', mu=0, sd=1, shape=n_counties)
a = pm.Deterministic("a", mu_a + sigma_a * a)
b = pm.Normal('b_raw', mu=0, sd=1, shape=n_counties)
b = pm.Deterministic("b", mu_b + sigma_b * b) |
|
@aseyboldt Yep, works way better -- thanks. What types of models would this sampler work best on? Also, intuitively it's a bit odd to me that precomputing the cov matrix once is slower than this sampling approach. |
|
Using the cov matrix takes n^2 multiplications per matrix-vector multiplication. Using this, it takes 2 * s * n, where n is the number of dimensions and s is the number of samples. It is faster if n > 2 * s. In practice I would expect that it takes a bit larger n to be worth it. If X is the (n * s) matrix of samples, the normal way using the full covariance matrix computes A = XX.T and then Ax, this patch computes X(X.Tx). |
|
Closing this for now. I couldn't find an example where it really helps. If I come across something I might revisit it again. |
I'm not really sure about this pull request, as it does not really seem to help in any of the models I tried, but I still thought it might be interesting enough to put it out there. Also, this might be helpful when someone implements Riemannian MCMC.
Right now, we have four options to specify the global mass matrix in hmc: Using a diagonal matrix (either as covariance or as precision matrix), a dense covariance or precision matrix, or a sparse covariance matrix in csr format. This pull request adds an additional option: defining the covariance matrix through samples X from the posterior. This allows us to use a full covariance matrix in high dimensional setting, and avoid having to store the full covariance matrix. To implement the potential we need matrix-vector products Ax and samples from N(0, A^-1), were A is the covariance matrix. We can compute the first, using Ax = X^tXx. Sampling from the normal distribution is a bit more difficult, I implemented a krylov subspace based algorithm described in [1]. As high dimensional sample covariances are usually problematic, I used a simple stein-type estimator to shrink it towards the diagonal matrix containing the variances.
However, this does not seem to help as much as I hoped it would. Possible reasons I could think of are:
[1] Chow, E., and Y. Saad. “Preconditioned Krylov Subspace Methods for Sampling Multivariate Gaussian Distributions.” SIAM Journal on Scientific Computing 36, no. 2 (January 1, 2014): A588–608. doi:10.1137/130920587.