Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start documentation of sum_to_zero_vector #818

Merged
merged 8 commits into from
Nov 15, 2024
Merged

Conversation

WardBrian
Copy link
Member

Submission Checklist

  • Builds locally
  • New functions marked with <<{ since VERSION }>>
  • Declare copyright holder and open-source license: see below

Summary

Closes #804.

I think I need some help from @spinkney @bob-carpenter filling out the documentation of the specific transform, see the two "TODO:" lines in transforms.qmd

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company):
Simons Foundation

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

@spinkney
Copy link
Collaborator

I'll try and take a look this week

@bob-carpenter
Copy link
Member

I can take a shot at this, @spinkney. I think the definition is clear enough.

@bob-carpenter
Copy link
Member

I have most of this written up and just have to double check. I should be able to push the doc tomorrow (Wednesday). I'm just transating from the code. It's really nice that there's no Jacobian adjustment.

@bob-carpenter
Copy link
Member

I wrote out the math for the transform. It'd be great if @spinkney could review, though I'm pretty sure it at least matches the code as is (perhaps modulo a +1 or -1 on indexing, but I triple-checked all that, too).

@bob-carpenter bob-carpenter marked this pull request as ready for review October 4, 2024 21:25
@WardBrian
Copy link
Member Author

@spinkney do you think you will have a chance to look at this?

@spinkney
Copy link
Collaborator

@spinkney do you think you will have a chance to look at this?

When is the deadline?

@WardBrian
Copy link
Member Author

The 11th, unless we end up delaying the release again

@spinkney
Copy link
Collaborator

spinkney commented Nov 2, 2024

I will either finish this before 11/9 or let you know I won't be able to make the deadline.

@bob-carpenter
Copy link
Member

It's very minimal and it's all written---just needs someone to verify it's OK.

@spinkney
Copy link
Collaborator

spinkney commented Nov 6, 2024

It's very minimal and it's all written---just needs someone to verify it's OK.

I'm on vacation and coming back tomorrow. I'll take a look then.

@spinkney
Copy link
Collaborator

spinkney commented Nov 9, 2024

I'll have time to look at it this weekend/monday

@spinkney
Copy link
Collaborator

looking at this later this morning. Sorry about the delay!

Copy link
Collaborator

@spinkney spinkney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a bunch of details. We may even want to write out the Householder stuff. Adrian wrote about it here pyro-ppl/numpyro#1751 (comment). I took most of the derivation from something I posted on discourse at https://discourse.mc-stan.org/t/new-stan-data-type-zero-sum-vector/26215/11?u=spinkney.

\sum_{k=1}^K x_k = 0.
$$

For the transform, Stan uses the first part of an isometric log ratio
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first thing I think about reading this is why don't we just take z as the sum from i from 1:K-1 and then do 1 - z to construct the vector. I understand this is just the reference manual but a ref or mention that the isometric log ratio transform induces a geometry which is easier for HMC to explore is probably worth putting in.

Can we link to the user guide here?

Copy link
Collaborator

@spinkney spinkney Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might even write this out a bit more. In the old SUGs we had this comment

This is using a more sophisticated transform than the previously recommended form of setting the final element of the vector to the negative sum of the previous elements.

The issue with doing that is that it implies a fairly strong correlation among the zero-sum parameters.

We want a matrix $M$ which when multiplied by a vector gives the vector plus the negative sum of the elements prior.

$$ \alpha M = \bigg[\alpha_1, \alpha_2, \ldots, \alpha_{K -1}, -\sum_\limits{i=1}^{K - 1} \alpha_i \bigg] $$

The vector $\alpha$ needs to be length K, so I pad it an extra 0. An $M$ that satisfies this is a matrix with ones on the diagonal and negative ones in the last column (the K x K value can be anything since alpha has 0 at the Kth value)

$$ M = \begin{bmatrix} 1 & 0 & 0 & \dots & 0 & -1\\ 0& 1 & 0 &\dots & 0 & - 1 \\ 0& 0 & 1 & \dots & 0 & - 1 \\ \vdots & & \ddots & & \vdots & \vdots\\ 0 & & \dots & & 0 & 1\\ \end{bmatrix} $$

Let's say that we want to put a standard normal prior on alpha. The transform on this standard normal is

$$ \alpha M \sim MVN(0, MIM^T). $$

Using the fact that the variance of a constant matrix $A$ times a random matrix $X$ is

$$ \begin{align} \mathbb{V}[M𝑋] &= \mathbb{E}[(M(𝑋−\mu))(M(𝑋−\mu))^𝑇] \\ &=\mathbb{E}(M(𝑋−\mu)(𝑋−\mu)^TM^T) \\ &=M\mathbb{E}[(𝑋−\mu)(𝑋−\mu)^𝑇]M^𝑇 \\ &=M\mathbb{V}[𝑋]M^𝑇. \end{align} $$

The correlation matrix of this becomes

$$ \begin{bmatrix} 1.0000 & 0.5000 & \dots & 0.5000 & -0.7071 \\ 0.5000 & 1.0000 & \dots & 0.5000 & -0.7071 \\ 0.5000 & 0.5000 & \dots &1.0000 & -0.7071 \\ \vdots & \ddots & \ddots & \vdots & \vdots \\ -0.7071 & -0.7071 & \dots & -0.7071 & 1.0000 \\ \end{bmatrix} $$

Using the inverse isometric log transform is the same as applying a Householder transform and inducing a diagonal correlation matrix on N - 1 of the values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say "just a reference manual". We want to get the details of the transformation we're using right. This isn't a research paper, though, so we don't need to say why we're using this, though we can link to a paper or to Adrian's discussion online. External links are challenging to maintain, so use sparingly.

We can mention the link to Householder, but I wouldn't go through this whole discussion in the transforms chapter of the Stan reference manual.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to update. Do you want to take a stab at just updating it? If not, if you can write here what you think I should add, I'll add it.

Copy link
Collaborator

@spinkney spinkney Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to update. Do you want to take a stab at just updating it? If not, if you can write here what you think I should add, I'll add it.

I'm not sure where you want this, in the SUG or in the reference manual? I'll let you decide. Here's what I think should be added somewhere:

The reason we use the isometric log ratio transform is because it induces zero correlation among the transformed elements of the vector. The problem with simply setting the final element of the vector to the negative sum of the previous elements is that this induces strong correlations across the parameters. The transform used in Stan eliminates these correlations by constructing an orthogonal basis and applying it to the zero-sum-constraint (see this discussion by Adrian Seyboldt on the NumPyro Github repository for more information). Any orthogonal basis can be used, we happen to use the inverse isometric log transform because it is convenient to describe and the transform simplifies to scalar algebra rather than matrix operations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your description of "zero correlation" might be wrong. Think about a vector that sum to zero, if one element becomes larger, then other elements must be smaller, right? So as long as they sum to zero, they are all negativly correlated.

src/stan-users-guide/regression.qmd Outdated Show resolved Hide resolved
@bob-carpenter
Copy link
Member

I added a short note and citation in the reference manual and then just used your text in full in the user's guide. I got rid of the simplex suggestion we had, but left the soft-centering.

Copy link
Collaborator

@spinkney spinkney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the one double space and the rest looks good!

src/reference-manual/types.qmd Outdated Show resolved Hide resolved
@bob-carpenter
Copy link
Member

@spinkney I think this is then ready for final review. I removed the extra space.

@spinkney
Copy link
Collaborator

@bob-carpenter lgtm!

@bob-carpenter bob-carpenter merged commit a748a4f into master Nov 15, 2024
@WardBrian WardBrian deleted the zero-sum-vector branch November 15, 2024 21:42
@spinkney
Copy link
Collaborator

spinkney commented Nov 26, 2024 via email

@bob-carpenter
Copy link
Member

Also, when discussing correlation, there's the issue of whether we're talking about the N - 1 unconstrained parameters or the N constrained parameters.

@lingium
Copy link
Contributor

lingium commented Nov 27, 2024

yeah, unconstrained parameters are independent, and constrained parameters are negatively (but equally, guess @spinkney also would like to mentin this) correlated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document sum_to_zero_vector
4 participants