Add LKJ distribution for cholesky factor of correlation matrix #1336

sethaxen · 2021-06-01T19:58:01Z

Distributions already has an LKJ(d, η) whose support is d × d correlation matrices. More useful for probabilistic programming is the equivalent LKJ distribution on whose support is the corresponding cholesky factor L (or equivalently U), because then the cholesky factor can be directly sampled and used, e.g. in the multivariate normal density, without needing to compute the cholesky decomposition.

There are d(d-1)/2 unique elements in the strict lower triangle tril₋(R) of a correlation matrix R=L L^T. Likewise, there are d(d-1)/2 unique elements in the strict lower triangle tril₋(L) of its cholesky factor L. The map ϕ: tril₋(R) → tril₋(L) is bijective, and the log determinant of its Jacobian is logdet(J) = \sum_{i=2}^d (i - d) \log(L_{ii}). The resulting logkernel is only adjusted by -logdet(J) and works out to logkernel(L) = \sum_{i=2}^d (d - i + 2η - 2)) \log(L_{ii}). logc0 is unchanged. As a result, we can trivially adapt the code from LKJ to define this distribution.

It would also be useful to have a convenient constructor for MvNormal that could take a mean vector, the cholesky factor of a correlation matrix, and a vector of standard deviations, or alternatively, a mean vector and the cholesky factor of the covariance matrix.

Related work

AltDistributions.jl implements this distribution as LKJL (however, it pre-dates LKJ in this package). Likewise, this distribution is defined in several probabilistic programming languages, notably Stan (see documentation and implementation).

See also discussions in TuringLang/Turing.jl#1629 (comment) and TuringLang/Bijectors.jl#134 (comment).

The text was updated successfully, but these errors were encountered:

devmotion · 2021-06-01T20:12:41Z

It would also be useful to have a convenient constructor for MvNormal that could take a mean vector, the cholesky factor of a correlation matrix, and a vector of standard deviations, or alternatively, a mean vector and the cholesky factor of the covariance matrix.

Isn't this already possible? MvNormal supports arbitrary AbstractPDMats as covariance matrix and even converts matrices to AbstractPDMat. And PDMats seems to support constructing a PDMat from a Cholesky decomposition: https://github.com/JuliaStats/PDMats.jl/blob/84bdd885a4e9f22a1832e63f54894760c632b312/src/pdmat.jl#L21

sethaxen · 2021-06-01T20:26:11Z

Isn't this already possible? MvNormal supports arbitrary AbstractPDMats as covariance matrix and even converts matrices to AbstractPDMat. And PDMats seems to support constructing a PDMat from a Cholesky decomposition: https://github.com/JuliaStats/PDMats.jl/blob/84bdd885a4e9f22a1832e63f54894760c632b312/src/pdmat.jl#L21

Yes, you're right. But right now, if a user has μ::Vector, σ::Vector, and L::LowerTriangular, they need to do something like

using PDMats
MvNormal(μ, PDMat(Cholesky(σ .* L.data, 'L', 0)))

This isn't ideal for several reasons. First, it requires the user to specify args for Cholesky (on Julia 1.7, they can do MvNormal(μ, PDMat(Cholesky(σ .* L))), which is a little better). Second, it requires them to know how the cholesky factor of the correlation matrix is related to the covariance matrix, which isn't terrible, but it does make it harder to use. Third, depending on the application, it may be more efficient to not multiply σ by L.

A new MvNormal constructor could make this simpler, but perhaps a better alternative is a new AbstractPDMat type.

devmotion · 2021-06-01T21:21:34Z

perhaps a better alternative is a new AbstractPDMat type.

Yes, I think this would be the better approach since it is more general and hence could be used also for other AbstractMvNormal distributions. Additionally, it would not increase the complexity of the MvNormal constructors any further.

One also has to think about the VariateForm for such a distribution (maybe just CholeskyVariate?) since its samples would not be of type Real or Array such as the samples of the distributions with ArrayLikeVariate VariateForm (e.g., Univariate, Multivariate, and Matrixvariate) currently implemented in Distributions.

sethaxen · 2021-06-01T21:37:40Z

Yes, I think this would be the better approach since it is more general and hence could be used also for other AbstractMvNormal distributions. Additionally, it would not increase the complexity of the MvNormal constructors any further.

Yes, these are good points.

One also has to think about the VariateForm for such a distribution (maybe just CholeskyVariate?) since its samples would not be of type Real or Array such as the samples of the distributions with ArrayLikeVariate VariateForm (e.g., Univariate, Multivariate, and Matrixvariate) currently implemented in Distributions.

My idea was just to add a distribution whose VariateForm was Matrixvariate to avoid adding a more complicated VariateForm here. So random samples would be Matrixes (or perhaps LowerTriangular and also UpperTriangular if we have an uplo field). It could be useful if random samples where Cholesky types, but I'm also thinking of how this would be used in PPLs, and that could cause confusion.

devmotion · 2021-06-01T21:43:43Z

So random samples would be Matrixes (or perhaps LowerTriangular and also UpperTriangular if we have an uplo field).

IIRC ArrayLikeVariate is supposed to cover only samples of type Real and Array, so Matrix would be covered but special triangular matrix types not. IMO it would be a bit unfortunate to use Matrix and not exploit the special matrix structure.

that could cause confusion.

What problems do you have in mind?

johnczito · 2021-06-01T21:52:02Z

At some point I sketched out some types I called CholeskyDecomposedMatrixDistribution, SVDecomposedMatrixDistribution, etc. They were just wrappers for the MatrixDistribution types that made sure logpdf handled the jacobian and rand returned instances of SVD, Cholesky, whatever. In general rand was totally naive, but it could be overloaded on a distribution-by-distribution basis if you had a way to sample the decomposition directly.

Would that be helpful here? Or is the best solution a completely separate type?

sethaxen · 2021-06-02T12:23:45Z

IIRC ArrayLikeVariate is supposed to cover only samples of type Real and Array, so Matrix would be covered but special triangular matrix types not.

Oh, that is unfortunate. We should definitely confirm this though, because it would be nice if we could use ArrayLikeVariate.

IMO it would be a bit unfortunate to use Matrix and not exploit the special matrix structure.

Yes, but LowerTriangular uses dense storage anyways, so even if we returned a lower-triangular Matrix, a user could wrap it in LowerTriangular if doing something like solving.

What problems do you have in mind?

I'm thinking first a user comes from e.g. Stan, PyMC3, or one of the other popular PPLs and uses an LKJCholesky distribution assuming it behaves like Stan's equivalent (i.e. is supported on lower triangular cholesky factors), so they multiply a sample by a vector, but if it's a Cholesky, IIRC it multiplies like the correlation matrix, not its cholesky factor. This could also be jarring because it's the only distribution whose support is not a scalar or array. Then one would need a bijector that maps from a Cholesky to ideally a vector, which is even more complicated than mapping from a matrix to a vector, which is not (yet) supported. These are not major problems, but they are downsides.

And in fact, while writing this, I'm liking more the idea of an LKJCholesky whose samples are Cholesky objects.

At some point I sketched out some types I called CholeskyDecomposedMatrixDistribution, SVDecomposedMatrixDistribution, etc. They were just wrappers for the MatrixDistribution types that made sure logpdf handled the jacobian and rand returned instances of SVD, Cholesky, whatever. In general rand was totally naive, but it could be overloaded on a distribution-by-distribution basis if you had a way to sample the decomposition directly.

Would that be helpful here? Or is the best solution a completely separate type?

Good question. I lean towards a completely separate type for simplicity, unless there are ample use cases for these special wrappers. e.g. the only three uses I know of for CholeskyDecomposedMatrixDistribution would be Wishart, InverseWishart, and LKJ, and I'm not certain how high the demand would be for cholesky parameterizations of Wishart/InverseWishart.

johnczito · 2021-06-02T15:47:14Z

I'm liking more the idea of an LKJCholesky whose samples are Cholesky objects.

Second this.

unless there are ample use cases for these special wrappers. e.g. the only three uses I know of...would be Wishart, InverseWishart, and LKJ...

We also have MatrixFDist and MatrixBeta that are random posdef matrices. Not that they necessarily rise to the level of "use cases" because I doubt they're much used at this point.

I lean towards a completely separate type for simplicity

Right. I can see how the wrapper construction might just get in the way in the PPL context. Not that these things are mutually exclusive.

Anyhow, I read but did not really understand the Stan source code for sampling LKJ-Cholesky directly (here and here). The comments say "The implementation is Ben Goodrich's Cholesky factor-based approach to the C-vine method of [LKJ (JMA 2009)]." Is there a write-up of this anyplace, or would we just have to port the Stan code? Does that pose a licensing issue?

sethaxen · 2021-06-02T22:37:34Z

I'm liking more the idea of an LKJCholesky whose samples are Cholesky objects.

Second this.

If we go this route, we should definitely have something like a CholeskyVariateForm.

We also have MatrixFDist and MatrixBeta that are random posdef matrices. Not that they necessarily rise to the level of "use cases" because I doubt they're much used at this point.

I lean towards a completely separate type for simplicity

Right. I can see how the wrapper construction might just get in the way in the PPL context. Not that these things are mutually exclusive.

Yeah, I guess my real priority is that there be a simple constructor. But even defining LKJCholesky as an alias for something like CholeskyDistribution{<:LKJ} with a convenient constructor gets us that, so I don't have a major preference here.

Anyhow, I read but did not really understand the Stan source code for sampling LKJ-Cholesky directly (here and here). The comments say "The implementation is Ben Goodrich's Cholesky factor-based approach to the C-vine method of [LKJ (JMA 2009)]." Is there a write-up of this anyplace, or would we just have to port the Stan code? Does that pose a licensing issue?

I think it's just using part of the algorithm in the paper to generate random canonical partial correlations, and then instead of pushing those through a map to a correlation matrix, it pushes them through a map directly to the cholesky factor. I think that's what their read_corr_L function is doing. I don't know of a write-up anywhere, but I'll look for one. I'd like to understand how it works, so I may implement it from scratch, but I think we could also port. Stan math uses a BSD-3 license, which is just slightly more restrictive than MIT (see https://opensource.stackexchange.com/a/582), so we would I think need to make just that code have a BSD-3 license.

devmotion · 2021-06-02T22:48:07Z

Can't it just be based on the paper and the documentation to avoid any licensing issues?

johnczito · 2021-06-02T22:51:39Z

But even defining LKJCholesky as an alias for something like CholeskyDistribution{<:LKJ} with a convenient constructor gets us that, so I don't have a major preference here.

Good to know.

...it pushes them through a map directly to the cholesky factor. I think that's what their read_corr_L function is doing.

Yeah, that's the bit I don't understand yet. And note that Distributions does not include any of that partial correlation, vine stuff. It currently uses the onion sampler.

devmotion · 2021-06-02T22:54:23Z

The bijective mapping in Bijectors is based on the partial correlations: https://github.com/TuringLang/Bijectors.jl/blob/master/src/bijectors/corr.jl

It's documented in the docstrings and based on https://mc-stan.org/docs/2_26/reference-manual/correlation-matrix-transform-section.html (IIRC one formula in the Stan docs was incorrect though).

johnczito · 2021-06-02T22:57:48Z

It's documented in the docstrings and based on https://mc-stan.org/docs/2_26/reference-manual/correlation-matrix-transform-section.html (IIRC one formula in the Stan docs was incorrect though).

Just what the doctor ordered. Thanks!

sethaxen · 2021-06-03T21:28:55Z

Great! I'll try to put together a PR implementing this next week. Also, after conversations with @cscherrer in JuliaMath/MeasureTheory.jl#101 (comment), I think it's worthwhile including an uplo parameter to specify whether the cholesky factor is stored in the upper or lower part of the Cholesky object, so that the power user can avoid an unnecessary copy. (cholesky always uses an uplo='U', but functions that use Cholesky allow both). I suspect uplo='L' is more useful for more users.

sethaxen mentioned this issue Jun 1, 2021

Issues using LKJL JuliaMath/MeasureTheory.jl#100

Open

sethaxen mentioned this issue Jun 2, 2021

LKJL measures JuliaMath/MeasureTheory.jl#101

Merged

This was referenced Jun 3, 2021

Feature request: Add LKJCholesky TuringLang/Turing.jl#1629

Closed

Add LKJCholesky #1339

Merged

Constructor for PDMat with Cholesky pre- and post- multiplied by diagonal matrix JuliaStats/PDMats.jl#132

Open

mschauer closed this as completed Oct 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LKJ distribution for cholesky factor of correlation matrix #1336

Add LKJ distribution for cholesky factor of correlation matrix #1336

sethaxen commented Jun 1, 2021

devmotion commented Jun 1, 2021

sethaxen commented Jun 1, 2021

devmotion commented Jun 1, 2021

sethaxen commented Jun 1, 2021

devmotion commented Jun 1, 2021

johnczito commented Jun 1, 2021

sethaxen commented Jun 2, 2021

johnczito commented Jun 2, 2021

sethaxen commented Jun 2, 2021

devmotion commented Jun 2, 2021

johnczito commented Jun 2, 2021

devmotion commented Jun 2, 2021

johnczito commented Jun 2, 2021

sethaxen commented Jun 3, 2021

Add LKJ distribution for cholesky factor of correlation matrix #1336

Add LKJ distribution for cholesky factor of correlation matrix #1336

Comments

sethaxen commented Jun 1, 2021

Related work

devmotion commented Jun 1, 2021

sethaxen commented Jun 1, 2021

devmotion commented Jun 1, 2021

sethaxen commented Jun 1, 2021

devmotion commented Jun 1, 2021

johnczito commented Jun 1, 2021

sethaxen commented Jun 2, 2021

johnczito commented Jun 2, 2021

sethaxen commented Jun 2, 2021

devmotion commented Jun 2, 2021

johnczito commented Jun 2, 2021

devmotion commented Jun 2, 2021

johnczito commented Jun 2, 2021

sethaxen commented Jun 3, 2021