Skip to content

Commit c738d9a

Browse files
authored
Merge pull request #10 from zoj613/documentation
DOC: Add documentation
2 parents 409d516 + f37b40f commit c738d9a

12 files changed

+699
-15
lines changed

README.md

+171-2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,172 @@
1-
# Fast Simulation of Hyperplane-Truncated Multivatiate Normal Distributions
1+
# htnorm
22

3-
An implementation of a fast and exact simulation algorithm for a multivariate normal distribution truncated on the intersection of a set of hyperplanes.
3+
This repo provides a C implementation of a fast and exact sampler from a
4+
multivariate normal distribution (MVN) truncated on a hyperplane as described [here][1]
5+
6+
this repo implements the following from the paper:
7+
8+
- efficient Sampling from a MVN truncated on a hyperplane:
9+
10+
![hptrunc](https://latex.codecogs.com/svg.latex?%5Cmathbf%7Bx%7D%20%5Csim%20%5Cmathcal%7BN%7D_%7B%5Cmathcal%7BS%7D%7D%28%5Cmathbf%7B%5Cmu%7D%2C%20%5Cmathbf%7B%5CSigma%7D%29%3B%20%5Chspace%7B2mm%7D%20%5Cmathcal%7BS%7D%20%3D%20%5C%7B%5Cmathbf%7Bx%7D%20%3A%20%5Cmathbf%7BG%7D%5Cmathbf%7Bx%7D%20%3D%20%5Cmathbf%7Br%7D%5C%7D%2C%20%5Cmathbf%7BG%7D%20%5Cin%20%5Cmathcal%7BR%7D%5E%7Bk_2%20%5Ctimes%20k%7D%2C%20rank%28%5Cmathbf%7BG%7D%29%20%3D%20k_2%20%3C%20k)
11+
12+
- efficient sampling from a MVN with a stuctured precision matrix:
13+
14+
![struc](https://latex.codecogs.com/svg.latex?%5Cmathbf%7Bx%7D%20%5Csim%20%5Cmathcal%7BN%7D%5C%5B%5Cmathbf%7B%5Cmu%7D%2C%20%28%5Cmathbf%7BA%7D%20+%20%5Cmathbf%7B%5CPhi%7D%5ET%5Cmathbf%7B%5COmega%7D%5Cmathbf%7B%5CPhi%7D%29%5E%7B-1%7D%5C%5D%3B%20%5Chspace%7B2mm%7D%20%5Cmathbf%7B%5CPhi%7D%20%5Cin%20%5Cmathcal%7BR%7D%5E%7Bn%20%5Ctimes%20p%7D%2C%20%5Cmathbf%7B%5COmega%7D%20%5Cin%20%5Cmathcal%7BR%7D%5E%7Bn%20%5Ctimes%20n%7D%2C%20%5Cmathbf%7BA%7D%20%5Cin%20%5Cmathcal%7BR%7D%5E%7Bp%20%5Ctimes%20p%7D)
15+
16+
- efficent sampling frfom a MVN with a structured precision and mean:
17+
18+
![strucmean](https://latex.codecogs.com/svg.latex?%5Cmathbf%7Bx%7D%20%5Csim%20%5Cmathcal%7BN%7D%5CBig%5C%5B%28%5Cmathbf%7BA%7D%20+%20%5Cmathbf%7B%5CPhi%7D%5ET%5Cmathbf%7B%5COmega%7D%5Cmathbf%7B%5CPhi%7D%29%5E%7B-1%7D%5Cmathbf%7B%5CPhi%7D%5ET%5Cmathbf%7B%5COmega%7D%5Cmathbf%7Bt%7D%2C%20%28%5Cmathbf%7BA%7D%20+%20%5Cmathbf%7B%5CPhi%7D%5ET%5Cmathbf%7B%5COmega%7D%5Cmathbf%7B%5CPhi%7D%29%5E%7B-1%7D%5CBig%5C%5D%3B%20%5Chspace%7B2mm%7D%20%5Cmathbf%7B%5COmega%7D%20%5Cin%20%5Cmathcal%7BR%7D%5E%7Bn%20%5Ctimes%20n%7D%2C%20%5Cmathbf%7BA%7D%20%5Cin%20%5Cmathcal%7BR%7D%5E%7Bp%20%5Ctimes%20p%7D)
19+
20+
The algorithms implemented have the following practical applications:
21+
- Topic models when unknown parameters can be interpreted as fractions.
22+
- Admixture models
23+
- discrete graphical models
24+
- Sampling from posterior distribution of an Intrinsic Conditional Autoregressive prior [icar][8]
25+
- Sampling from posterior conditional distributions of various bayesian regression problems.
26+
27+
28+
## Dependencies
29+
30+
- a C compiler that supports the C99 standard or later
31+
- an installation of BLAS and LAPACK that exposes its C interface via the headers `<cblas.h>` and `<lapacke.h>`
32+
(e.g openBLAS).
33+
34+
35+
## Usage
36+
37+
Building a shared library of `htnorm` can be done with the following:
38+
```bash
39+
$ cd src/
40+
# optionally set path to CBLAS and LAPACKE headers using INCLUDE_DIRS environmental variable
41+
$ export INCLUDE_DIRS="some/path/to/headers"
42+
# optionally set path to BLAS installation shared library
43+
$ export LIBS_DIR="some/path/to/library/"
44+
# optionally set the linker flag for your BLAS installation (e.g -lopenblas)
45+
$ export LIBS=<flag here>
46+
$ make lib
47+
```
48+
Afterwards the shared library will be found in a `lib/` directory of the project root,
49+
and the library can be linked dynamically via `-lhtnorm`.
50+
51+
The puplic API exposes the samplers through the function declarations
52+
- `int htn_hyperplane_truncated_mvn(rng_t* rng, const ht_config_t* conf, double* out)`
53+
- `int htn_structured_precision_mvn(rng_t* rng, const sp_config_t* conf, double* out)`
54+
55+
The details of the parameters are documented in ther header files ["htnorm.h"][4].
56+
57+
Random number generation is done using [PCG64][2] or [Xoroshiro128plus][3] bitgenerators.
58+
The API allows using a custom generator, and the details are documented in the header file
59+
["rng.h"][5].
60+
61+
## Examples
62+
```C
63+
#include "htnorm.h"
64+
65+
int main ()
66+
{
67+
...
68+
69+
// instantiate a random number generator
70+
rng_t* rng = rng_new_pcg64();
71+
ht_config_t config;
72+
config.g = ...; // G matrix
73+
config.gnrow = ...; // number of rows of G
74+
config.gncol = ...; // number of columns of G
75+
cofig.r = ...; // r array
76+
config.mean = ...; // mean array
77+
config.cov = ...; // the convariance matrix
78+
confi.diag = ...; // whether covariance is diagonal
79+
80+
double* samples = ...; // array to store the samples
81+
// now call the sampler
82+
int res_info = htn_hyperplane_truncated_mvn(rng, &config, samples);
83+
84+
// res_info contains a number that indicates whether sampling failed or not.
85+
86+
...
87+
88+
// finally free the RNG pointer at some point
89+
rng_free(rng);
90+
91+
...
92+
return 0;
93+
}
94+
```
95+
96+
## Python API
97+
98+
A high level python interface to the library is also provided. Installing it from
99+
source requires an installation of [poetry][7] and the following shell commands:
100+
101+
```bash
102+
$ git clone https://github.com/zoj613/htnorm.git
103+
$ cd htnorm/
104+
$ poetry install
105+
# add htnorm to python's path
106+
$ export PYTHONPATH=$PWD:$PYTHONPATH
107+
```
108+
Below is an example of how to use htnorm in python to sample from a multivariate
109+
gaussian truncated on the hyperplane ![sumzero](https://latex.codecogs.com/svg.latex?%5Cmathbf%7B1%7D%5ET%5Cmathbf%7Bx%7D%20%3D%200) (i.e. making sure the sampled values sum to zero)
110+
111+
```python
112+
from pyhtnorm import HTNGenerator
113+
import numpy as np
114+
115+
rng = HTNGenerator()
116+
117+
# generate example input
118+
k1 = 1000
119+
k2 = 1
120+
npy_rng = np.random.default_rng()
121+
temp = npy_rng.random((k1, k1))
122+
cov = a @ a.T + np.diag(npy_rng.random(k1))
123+
G = np.ones((k2, k1))
124+
r = np.zeros(k1)
125+
mean = npy_rng.random(k1)
126+
127+
samples = rng.hyperplane_truncated_mvnorm(mean, cov, G, r)
128+
# verify if sampled values sum to zero
129+
print(sum(samples))
130+
131+
# alternatively one can pass an array to store the results in
132+
out = np.empty(k1)
133+
rng.hyperplane_truncated_mvnorm(mean, cov, G, r, out=out)
134+
# verify
135+
print(out.sum())
136+
```
137+
138+
For more details about the parameters of the `HTNGenerator` and its methods,
139+
see the docstrings via python's `help` function.
140+
141+
The python API also exposes the `HTNGenerator` class as a Cython extension type
142+
that can be "cimported" in a cython script.
143+
144+
145+
## TODO
146+
147+
- Add an `R` interface to the library.
148+
149+
150+
## Licensing
151+
152+
`htnorm` is free software made available under the BSD-3 License. For details
153+
see the [LICENSE][6] file.
154+
155+
156+
## References
157+
- Cong, Yulai; Chen, Bo; Zhou, Mingyuan. Fast Simulation of Hyperplane-Truncated
158+
Multivariate Normal Distributions. Bayesian Anal. 12 (2017), no. 4, 1017--1037.
159+
doi:10.1214/17-BA1052.
160+
- Bhattacharya, A., Chakraborty, A., and Mallick, B. K. (2016).
161+
“Fast sampling with Gaussian scale mixture priors in high-dimensional regression.”
162+
Biometrika, 103(4):985.
163+
164+
165+
[1]: https://projecteuclid.org/euclid.ba/1488337478
166+
[2]: https://www.pcg-random.org/
167+
[3]: https://en.wikipedia.org/wiki/Xoroshiro128%2B
168+
[4]: https://github.com/zoj613/htnorm/blob/main/include/htnorm.h
169+
[5]: https://github.com/zoj613/htnorm/blob/main/include/rng.h
170+
[6]: https://github.com/zoj613/htnorm/blob/main/LICENSE
171+
[7]: https://python-poetry.org/docs/pyproject/
172+
[8]: https://www.sciencedirect.com/science/article/abs/pii/S1877584517301600

include/htnorm.h

+69-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,27 @@
11
/* Copyright (c) 2020, Zolisa Bleki
22
*
3-
* SPDX-License-Identifier: BSD-3-Clause */
3+
* SPDX-License-Identifier: BSD-3-Clause
4+
*
5+
* Fast Simulation of Hyperplane-Truncated Multivaariate Normal Distributions.
6+
*
7+
* This library implements the algorithms 2, 4 and the one in example 4 of
8+
* Cong, Y., Chen, B., & Zhou, M. (2017).
9+
*
10+
* Algorithm 2 allows fast and exact sampling from a multivariate normal (MVN)
11+
* distribution that is trancated on a hyperplane.
12+
*
13+
* Algorthm 4 allows efficient sampling from a MVN with a structured precision
14+
* matrix.
15+
*
16+
* Algorithm described in example 4 allows efficient sampling from a MVN with
17+
* a structured precision matrix and a structured mean dependent on the
18+
* structure of the precision matrix.
19+
*
20+
* References:
21+
* 1) Cong, Y., Chen, B., & Zhou, M. (2017). Fast simulation of
22+
* hyperplane-truncated multivariate normal distributions. Bayesian Analysis,
23+
* 12(4), 1017-1037.
24+
*/
425
#ifndef HTNORM_HTNORM_H
526
#define HTNORM_HTNORM_H
627

@@ -11,6 +32,7 @@
1132

1233
#define INIT_HT_CONFIG(x) (x) = {.diag = false}
1334

35+
// initialize a pointer to a `ht_config_t` struct.
1436
#define INIT_SP_CONFIG(x) \
1537
(x) = {.struct_mean = false, .a_id = NORMAL, .o_id = NORMAL}
1638

@@ -54,8 +76,53 @@ typedef struct {
5476
bool struct_mean;
5577
} sp_config_t;
5678

57-
79+
/* Sample from a multivariate normal distribution truncated on a hyperplane.
80+
*
81+
* Generate sample x from the distribution: N(mean, cov) truncated on the plane
82+
* {x | Gx = r}, where the rank of G is less than that of the covariance.
83+
*
84+
* Paramaters
85+
* ----------
86+
* rng:
87+
* A pointer to the `rng_t` struct (The random number generator).
88+
* conf:
89+
* A pointer to the input configuration struct `ht_config_t`.
90+
* out:
91+
* An array to store the generated samples.
92+
*
93+
* Returns
94+
* -------
95+
* Zero if the sampling was successful. A positive integer is returned if the
96+
* sampled failed because the covariance is not positive definite or if a
97+
* factorization of the covariance was not successful. A negative integer is
98+
* returned if one of the inputs contains an illegal value (non-numerical/NaN).
99+
*/
58100
int htn_hyperplane_truncated_mvn(rng_t* rng, const ht_config_t* conf, double* out);
101+
102+
103+
/* Sample from a MVN with a structured precision matrix and/or structured mean.
104+
*
105+
* Sample from a MVN: N(mean, (A + phi^T * Omega * phi)^-1) or
106+
* N((A + phi^T * Omega * phi)^-1 * phi^T * t, (A + phi^T * Omega * phi)^-1))
107+
* This algorithm in very efficient and ideal when the dimension of matrix A
108+
* is greater than that of matrix Omega.
109+
*
110+
* Parameters
111+
* ----------
112+
* rng:
113+
* A pointer to the `rng_t` struct (The random number generator).
114+
* conf:
115+
* A pointer to the input configuration struct `sp_config_t`.
116+
* out:
117+
* An array to store the generated samples.
118+
*
119+
* Returns
120+
* -------
121+
* Zero if the sampling was successful. A positive integer is returned if the
122+
* sampled failed because the covariance is not positive definite or if a
123+
* factorization of the covariance was not successful. A negative integer is
124+
* returned if one of the inputs contains an illegal value (non-numerical/NaN).
125+
* */
59126
int htn_structured_precision_mvn(rng_t* rng, const sp_config_t* conf, double* out);
60127

61128
#endif

include/rng.h

+27-1
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,50 @@
11
/* Copyright (c) 2020, Zolisa Bleki
22
*
3-
* SPDX-License-Identifier: BSD-3-Clause */
3+
* SPDX-License-Identifier: BSD-3-Clause
4+
*
5+
* The public API of the underlying random number generator. The `rng_t`
6+
* struct is passed as a parameter to both sampling functions in the htnorm
7+
* header.
8+
*/
49
#ifndef HTNORM_RNG_H
510
#define HTNORM_RNG_H
611

712
#include <stdint.h>
813

14+
/* The bitgenerator interface. Users can use this struct in order to define
15+
* their own bitgenerator, The minimum requirement is to provide a function
16+
* that generate 64bit unsigned integers and doubles in the range (0, 1).
17+
*/
918
typedef struct bitgen {
19+
// the bsse bitgenerator
1020
void* base;
21+
// a function pointer that takes `base` as an input an returns a positive intgger
1122
uint64_t (*next_int)(void* base);
23+
// a function pointer that takes `base` as an input an returns a double in the range (0, 1)
1224
double (*next_double)(void* base);
1325
} rng_t;
1426

27+
// Helper function to deallocate a `rng_t` pointer.
1528
void rng_free(rng_t* rng);
1629

30+
/* The following functions provide an interface to easily generate random
31+
* integers or standard uniform variables using either PCG64 or Xoroshiro128plus
32+
* bit generators.
33+
*
34+
* For more details, see:
35+
* 1) https://www.pcg-random.org/
36+
* 2) https://www.pcg-random.org/posts/visualizing-the-heart-of-some-prngs.html
37+
* 3) https://www.pcg-random.org/posts/bounded-rands.html
38+
*/
1739

40+
// Return a pointer to rng_t based on PCG64 bit generator that is randomly seeded.
1841
rng_t* rng_pcg64_new(void);
42+
// Return a pointer to rng_t based n PCG64 bit generator with a specified seed.
1943
rng_t* rng_pcg64_new_seeded(uint64_t seed);
2044

45+
// Return a pointer to rng_t based on Xoroshiro128plus that is randomly seeded.
2146
rng_t* rng_xrs128p_new(void);
47+
// Return a pointer to rng_t based on Xoroshiro128plus with a specified seed.
2248
rng_t* rng_xrs128p_new_seeded(uint64_t seed);
2349

2450
#endif

0 commit comments

Comments
 (0)