Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start documentation of sum_to_zero_vector #818

Merged
merged 8 commits into from
Nov 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions src/bibtex/all.bib
Original file line number Diff line number Diff line change
Expand Up @@ -1845,3 +1845,29 @@ @article{Timonen+etal:2023:ODE-PSIS
pages = {e614}
}

@article{egozcue+etal:2003,
title={Isometric logratio transformations for compositional data analysis},
author={Egozcue, Juan Jos{\'e} and Pawlowsky-Glahn, Vera and Mateu-Figueras, Gl{\`o}ria and Barcelo-Vidal, Carles},
journal={Mathematical Geology},
volume={35},
number={3},
pages={279--300},
year={2003}
}

@book{filzmoser+etal:2018,
title={Geometrical properties of compositional data},
author={Filzmoser, Peter and Hron, Karel and Templ, Matthias},
booktitle={Applied Compositional Data Analysis: With Worked Examples in R},
pages={35--68},
year={2018},
publisher={Springer}
}

@misc{seyboldt:2024,
author="Seyboldt, Adrian",
title="Add ZeroSumNormal distribution",
note="pyro-ppl GitHub repository issue \#1751",
year = "2024",
url ="https://github.com/pyro-ppl/numpyro/pull/1751#issuecomment-1980569811"
}
15 changes: 8 additions & 7 deletions src/reference-manual/expressions.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -149,9 +149,9 @@ any of the following.

```
int, real, complex, vector, simplex, unit_vector,
ordered, positive_ordered, row_vector, matrix,
cholesky_factor_corr, cholesky_factor_cov,
corr_matrix, cov_matrix, array
sum_to_zero_vector, ordered, positive_ordered,
row_vector, matrix, cholesky_factor_corr,
cholesky_factor_cov, corr_matrix, cov_matrix, array
```

The following built in functions are also reserved and
Expand Down Expand Up @@ -810,9 +810,9 @@ In addition to single integer indexes, as described in
[the language indexing section](#language-indexing.section), Stan supports multiple indexing.
Multiple indexes can be integer arrays of indexes, lower
bounds, upper bounds, lower and upper bounds, or simply shorthand for
all of the indexes. If the upper bound is smaller than the lower bound,
the range is empty (unlike, e.g., in R). The upper bound and lower bound can be
expressions that evaluate to integer. A complete list of index types is
all of the indexes. If the upper bound is smaller than the lower bound,
the range is empty (unlike, e.g., in R). The upper bound and lower bound can be
expressions that evaluate to integer. A complete list of index types is
given in the following table.

##### Indexing Options Table {- #index-types-table}
Expand Down Expand Up @@ -1078,6 +1078,7 @@ the following table shows the mapping from types to their primitive types.
| `vector` | `vector` |
| `simplex` | `vector` |
| `unit_vector` | `vector` |
| `sum_to_zero_vector` | `vector` |
| `ordered` | `vector` |
| `positive_ordered` | `vector` |
| `row_vector` | `row_vector` |
Expand Down Expand Up @@ -1378,7 +1379,7 @@ model {
}
```

Algebraically,
Algebraically,
[the distribution statement](statements.qmd#distribution-statements.section)
in the model could be reduced to

Expand Down
9 changes: 9 additions & 0 deletions src/reference-manual/grammar.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@

<identifier> ::= IDENTIFIER
| TRUNCATE
| JACOBIAN

<decl_identifier> ::= <identifier>
| <reserved_word>
Expand Down Expand Up @@ -57,10 +58,13 @@
| POSITIVEORDERED
| SIMPLEX
| UNITVECTOR
| SUMTOZERO
| CHOLESKYFACTORCORR
| CHOLESKYFACTORCOV
| CORRMATRIX
| COVMATRIX
| STOCHASTICCOLUMNMATRIX
| STOCHASTICROWMATRIX
| PRINT
| REJECT
| FATAL_ERROR
Expand Down Expand Up @@ -165,11 +169,16 @@
| POSITIVEORDERED LBRACK <expression> RBRACK
| SIMPLEX LBRACK <expression> RBRACK
| UNITVECTOR LBRACK <expression> RBRACK
| SUMTOZERO LBRACK <expression> RBRACK
| CHOLESKYFACTORCORR LBRACK <expression> RBRACK
| CHOLESKYFACTORCOV LBRACK <expression> [COMMA <expression>]
RBRACK
| CORRMATRIX LBRACK <expression> RBRACK
| COVMATRIX LBRACK <expression> RBRACK
| STOCHASTICCOLUMNMATRIX LBRACK <expression> COMMA
<expression> RBRACK
| STOCHASTICROWMATRIX LBRACK <expression> COMMA <expression>
RBRACK

<type_constraint> ::= [LABRACK <range> RABRACK]
| LABRACK <offset_mult> RABRACK
Expand Down
6 changes: 6 additions & 0 deletions src/reference-manual/syntax.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ The raw output is available [here](https://raw.githubusercontent.com/stan-dev/do
```
<identifier> ::= IDENTIFIER
| TRUNCATE
| JACOBIAN

<decl_identifier> ::= <identifier>

Expand Down Expand Up @@ -175,11 +176,16 @@ The raw output is available [here](https://raw.githubusercontent.com/stan-dev/do
| POSITIVEORDERED LBRACK <expression> RBRACK
| SIMPLEX LBRACK <expression> RBRACK
| UNITVECTOR LBRACK <expression> RBRACK
| SUMTOZERO LBRACK <expression> RBRACK
| CHOLESKYFACTORCORR LBRACK <expression> RBRACK
| CHOLESKYFACTORCOV LBRACK <expression> [COMMA <expression>]
RBRACK
| CORRMATRIX LBRACK <expression> RBRACK
| COVMATRIX LBRACK <expression> RBRACK
| STOCHASTICCOLUMNMATRIX LBRACK <expression> COMMA
<expression> RBRACK
| STOCHASTICROWMATRIX LBRACK <expression> COMMA <expression>
RBRACK

<type_constraint> ::= [LABRACK <range> RABRACK]
| LABRACK <offset_mult> RABRACK
Expand Down
87 changes: 85 additions & 2 deletions src/reference-manual/transforms.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ constrained to be ordered, positive ordered, or simplexes. Matrices
may be constrained to be correlation matrices or covariance matrices.
This chapter provides a definition of the transforms used for each
type of variable. For examples of how to declare and define these
variables in a Stan program, see section
variables in a Stan program, see section
[Variable declaration](types.qmd#variable-declaration.section).

Stan converts models to C++ classes which define probability
Expand All @@ -39,7 +39,7 @@ the exact arithmetic:
rounded to the boundary. This may cause unexpected warnings or
errors, if in other parts of the code the boundary value is
invalid. For example, we may observe floating-point value 0 for
a variance parameter that has been declared to be larger than 0.
a variance parameter that has been declared to be larger than 0.
See more about [Floating point Arithmetic in Stan user's guide](../stan-users-guide/floating-point.qmd)).
- CmdStan stores the output to CSV files with 6 significant digits
accuracy by default, but the constraints are checked with 8
Expand Down Expand Up @@ -462,6 +462,89 @@ p_Y(y)
$$



## Zero sum vector

Vectors that are constrained to sum to zero are useful for, among
other things, additive varying effects, such as varying slopes or
intercepts in a regression model (e.g., for income deciles).

A zero sum $K$-vector $x \in \mathbb{R}^K$ satisfies the constraint
$$
\sum_{k=1}^K x_k = 0.
$$

For the transform, Stan uses the first part of an isometric log ratio
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first thing I think about reading this is why don't we just take z as the sum from i from 1:K-1 and then do 1 - z to construct the vector. I understand this is just the reference manual but a ref or mention that the isometric log ratio transform induces a geometry which is easier for HMC to explore is probably worth putting in.

Can we link to the user guide here?

Copy link
Collaborator

@spinkney spinkney Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might even write this out a bit more. In the old SUGs we had this comment

This is using a more sophisticated transform than the previously recommended form of setting the final element of the vector to the negative sum of the previous elements.

The issue with doing that is that it implies a fairly strong correlation among the zero-sum parameters.

We want a matrix $M$ which when multiplied by a vector gives the vector plus the negative sum of the elements prior.

$$ \alpha M = \bigg[\alpha_1, \alpha_2, \ldots, \alpha_{K -1}, -\sum_\limits{i=1}^{K - 1} \alpha_i \bigg] $$

The vector $\alpha$ needs to be length K, so I pad it an extra 0. An $M$ that satisfies this is a matrix with ones on the diagonal and negative ones in the last column (the K x K value can be anything since alpha has 0 at the Kth value)

$$ M = \begin{bmatrix} 1 & 0 & 0 & \dots & 0 & -1\\ 0& 1 & 0 &\dots & 0 & - 1 \\ 0& 0 & 1 & \dots & 0 & - 1 \\ \vdots & & \ddots & & \vdots & \vdots\\ 0 & & \dots & & 0 & 1\\ \end{bmatrix} $$

Let's say that we want to put a standard normal prior on alpha. The transform on this standard normal is

$$ \alpha M \sim MVN(0, MIM^T). $$

Using the fact that the variance of a constant matrix $A$ times a random matrix $X$ is

$$ \begin{align} \mathbb{V}[M𝑋] &= \mathbb{E}[(M(𝑋−\mu))(M(𝑋−\mu))^𝑇] \\ &=\mathbb{E}(M(𝑋−\mu)(𝑋−\mu)^TM^T) \\ &=M\mathbb{E}[(𝑋−\mu)(𝑋−\mu)^𝑇]M^𝑇 \\ &=M\mathbb{V}[𝑋]M^𝑇. \end{align} $$

The correlation matrix of this becomes

$$ \begin{bmatrix} 1.0000 & 0.5000 & \dots & 0.5000 & -0.7071 \\ 0.5000 & 1.0000 & \dots & 0.5000 & -0.7071 \\ 0.5000 & 0.5000 & \dots &1.0000 & -0.7071 \\ \vdots & \ddots & \ddots & \vdots & \vdots \\ -0.7071 & -0.7071 & \dots & -0.7071 & 1.0000 \\ \end{bmatrix} $$

Using the inverse isometric log transform is the same as applying a Householder transform and inducing a diagonal correlation matrix on N - 1 of the values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say "just a reference manual". We want to get the details of the transformation we're using right. This isn't a research paper, though, so we don't need to say why we're using this, though we can link to a paper or to Adrian's discussion online. External links are challenging to maintain, so use sparingly.

We can mention the link to Householder, but I wouldn't go through this whole discussion in the transforms chapter of the Stan reference manual.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to update. Do you want to take a stab at just updating it? If not, if you can write here what you think I should add, I'll add it.

Copy link
Collaborator

@spinkney spinkney Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to update. Do you want to take a stab at just updating it? If not, if you can write here what you think I should add, I'll add it.

I'm not sure where you want this, in the SUG or in the reference manual? I'll let you decide. Here's what I think should be added somewhere:

The reason we use the isometric log ratio transform is because it induces zero correlation among the transformed elements of the vector. The problem with simply setting the final element of the vector to the negative sum of the previous elements is that this induces strong correlations across the parameters. The transform used in Stan eliminates these correlations by constructing an orthogonal basis and applying it to the zero-sum-constraint (see this discussion by Adrian Seyboldt on the NumPyro Github repository for more information). Any orthogonal basis can be used, we happen to use the inverse isometric log transform because it is convenient to describe and the transform simplifies to scalar algebra rather than matrix operations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your description of "zero correlation" might be wrong. Think about a vector that sum to zero, if one element becomes larger, then other elements must be smaller, right? So as long as they sum to zero, they are all negativly correlated.

transform; see [@egozcue+etal:2003] for the basic definitions and
Chapter 3 of [@filzmoser+etal:2018] for the pivot coordinate version
used here. Stan uses the isometric log ratio transform because it
induces a geometry with zero correlation among the dimensions, making
it easier for HMC to explore than simpler alternatives such as setting
the final element to the negative sum of the first elements; see, e.g.,
[@seyboldt:2024].




### Zero sum transform {-}

The (unconstraining) transform is defined iteratively. Given an $x \in
\mathbb{R}^{N + 1}$ that sums to zero (i.e., $\sum_{n=1}^{N+1} x_n =
0$), the transform proceeds as follows to produce an unconstrained $y
\in \mathbb{R}^N$.

The transform is initialized by setting
$$
S_N = 0
$$
and
$$
y_N = -x_{N + 1} \cdot \frac{\sqrt{N \cdot (N + 1)}{N}}.
$$
The for each $n$ from $N - 1$ down to $1$, let
$$
w_{n + 1} = \frac{y_{n + 1}}{\sqrt{(n + 1) \cdot (n + 2)}},
$$
$$
S_n = S_{n + 1} + w_{n + 1},
$$
and
$$
y_n = (S_n - x_{n + 1}) \cdot \frac{\sqrt{n \cdot (n + 1)}}{n}.
$$

### Zero sum inverse transform {-}

The inverse (constraining) transform follows the isometric logratio tranform.
It maps an unconstrained vector $y \in \mathbb{R}^N$ to a zero-sum vector $x \in
\mathbb{R}^{N + 1}$ such that
$$
\sum_{n=1}^{N + 1} x_n = 0.
$$
The values are defined inductively, starting with
$$
x_1 = \sum_{n=1}^N \frac{y_n}{\sqrt{n \cdot (n + 1)}}
$$
and then setting
$$
x_{n + 1} = \sum_{i = n + 1}^N \frac{\sqrt{y_i}}{\sqrt{i \cdot (i + 1)}}
- n \cdot \frac{y_n}{\sqrt{n \cdot (n + 1)}}.
$$
for $n \in 1{:}N$.

The definition is such that
$$
\sum_{n = 1}^{N + 1} x_n = 0
$$
by construction, because each of the terms added to $x_{n}$ is then
subtracted from $x_{n + 1}$ the number of times it shows up in earlier terms.

### Absolute Jacobian determinant of the zero sum inverse transform {-}

The inverse transform is a linear operation, leading to a constant Jacobian
determinant which is therefore not included.


## Unit simplex {#simplex-transform.section}

Variables constrained to the unit simplex show up in multivariate
Expand Down
29 changes: 27 additions & 2 deletions src/reference-manual/types.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -117,9 +117,10 @@ vector<lower=-1, upper=1>[3] rho;
```

There are also special data types for structured vectors and
matrices. There are four constrained vector data types, `simplex`
matrices. There are five constrained vector data types, `simplex`
for unit simplexes, `unit_vector` for unit-length vectors,
`ordered` for ordered vectors of scalars and
`sum_to_zero_vector` for vectors that sum to zero,
`ordered` for ordered vectors of scalars, and
`positive_ordered` for vectors of positive ordered
scalars. There are specialized matrix data types `corr_matrix`
and `cov_matrix` for correlation matrices (symmetric, positive
Expand Down Expand Up @@ -692,6 +693,26 @@ unit vectors, this is only done up to a statically specified accuracy
threshold $\epsilon$ to account for errors arising from floating-point
imprecision.

### Vectors that sum to zero {-}

A zero-sum vector is constrained such that the
sum of its elements is always $0$. These are sometimes useful
for resolving identifiability issues in regression models.
While the underlying vector has only $N - 1$ degrees of freedom,
zero sum vectors are declared with their full dimensionality.
For instance, `beta` is declared to be a zero-sum $5$-vector (4 DoF) by

```stan
sum_to_zero_vector[5] beta;
```

Zero sum vectors are implemented as vectors and may be assigned to other
vectors and vice-versa. Zero sum vector variables, like other constrained
variables, are validated to ensure that they are indeed unit length; for
zero sum vectors, this is only done up to a statically specified accuracy
threshold $\epsilon$ to account for errors arising from floating-point
imprecision.

### Ordered vectors {-}

An ordered vector type in Stan represents a vector whose entries are
Expand Down Expand Up @@ -1636,6 +1657,8 @@ dimensions `matrix[K, K]` types.
+---------------------+-------------------------+----------------------------------------------+
| | | `unit_vector[N]` |
+---------------------+-------------------------+----------------------------------------------+
| | | `sum_to_zero_vector[N]` |
+---------------------+-------------------------+----------------------------------------------+
| `row_vector` | `row_vector[N]` | `row_vector[N]` |
+---------------------+-------------------------+----------------------------------------------+
| | | `row_vector[N]<lower=L>` |
Expand Down Expand Up @@ -1700,6 +1723,8 @@ dimensions `matrix[K, K]` types.
+---------------------+-------------------------+----------------------------------------------+
| | | `array[M] unit_vector[N]` |
+---------------------+-------------------------+----------------------------------------------+
| | | `array[M] sum_to_zero_vector[N]` |
+---------------------+-------------------------+----------------------------------------------+

<a name="id:constrained-types.figure"></a>

Expand Down
Loading