-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
142 lines (109 loc) · 5.8 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
---
output:
github_document:
html_preview: false
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# [scaleBVS](https://github.com/albcab/scaleBVS)
Bayesian Variable Selection models provide a natural and coherent framework
to select a subset of explanatory variables in linear regression contexts
(Chipman, George and McCulloch, 2001).
The binary inclusion variables for each regressor typically possess pairwise
and/or negative dependence structures conjectured to be conductive to
successful application of weighted Tempered Gibbs Sampling (Zanella and Roberts, 2019).
This package has been concieved as an implementation of the weighted
Tempered Gibbs Sampling algorithm of Zanella and Roberts (2019) for Bayesian
Variable Selection in linear models. The algorithm returns weighted samples
from the posterior distribution of binary inclusion variables, thus allowing
for fully Bayesian posterior inferences.
The weighted Tempered Gibbs Sampling is applicable to high-dimensional
Bayesian Variable selection models thanks to its favorable mixing properties
and to an efficient computation of the full conditional distribution of the
binary inclusion probabilities. These full conditionals allow for the calculation
of Rao-Blackwellised estimators of the marginal Posterior Inclusion Probabilities
for each regressor, which are useful in quantifying the uncertainties of the true
underlying linear model.
## Installation
Install scaleBVS from [Github](https://github.com/albcab/scaleBVS) with:
```{r, eval = FALSE}
install.packages("devtools")
devtools::install_github("albcab/scaleBVS")
```
You can also track or contribute to the development of scaleBVS at [https://github.com/albcab/scaleBVS](https://github.com/albcab/scaleBVS).
## Features
scaleBVS samples from the posterior distribution of the vector of binary inclusion
variables <img src="https://render.githubusercontent.com/render/math?math=\gamma = (\gamma_1,...,\gamma_p) \in \{1,0\}^p">
for the linear model with p regressors and n observations:
<img src="https://render.githubusercontent.com/render/math?math=\mathbf{Y}|\beta_{\gamma},\gamma,\sigma^2 \sim N(X_{\gamma}\beta_{\gamma},\sigma^2 I_n)">
<img src="https://render.githubusercontent.com/render/math?math=\beta_{\gamma}|\gamma,\sigma^2 \sim N(0,c(X_{\gamma}^TX_{\gamma})^{-1})">
<img src="https://render.githubusercontent.com/render/math?math=p(\sigma^2) \propto \frac{1}{\sigma^2}">
<img src="https://render.githubusercontent.com/render/math?math=\gamma_i|h \overset{iid}{\sim} Bernoulli(h)">
where <img src="https://render.githubusercontent.com/render/math?math=c > 0">.
The variable <img src="https://render.githubusercontent.com/render/math?math=h">
can be either set to a fixed value between 0 and 1 or given a Beta prior
distribution with two fixed hyperparameters, both greater than 0.
In the case that <img src="https://render.githubusercontent.com/render/math?math=X_{\gamma}^TX_{\gamma}">
is singular, the Moore-Penrose inverse (pseudo or generalized inverse) of the matrix is used.
For more information on weighted Tempered Gibbs Sampling, please refer to Zanella and
Roberts (2019).
## Usage
scaleBVS contains one main function `samplingBVS()` used to calculate Posterior
Inclusion Probabilities for any set of p regressors. The function `createSamples()`
computes the matrix of samples and weights given the output of `samplingBVS()`.
Since the model does not include an intercept on the regessors, it is recommended
to standardize both the dependent and independent variables to a zero mean.
```{r}
#Standardizing
X <- t(t(mtcars[,-1]) - colMeans(mtcars[,-1]))
y <- mtcars$mpg - mean(mtcars$mpg)
#SAMPLING
mtcars.output <- scaleBVS::samplingBVS(y, X)
#POSTERIOR INCLUSION PROBABILITIES
names(mtcars.output$PIP) <- names(mtcars[,-1])
print(mtcars.output$PIP)
#SAMPLES
mtcars.samples <- scaleBVS::createSamples(mtcars.output$states)
head(mtcars.samples$samples, 3)
```
## Benchmarks
scaleBVS is very fast and can easily handle high-dimensional problems.
We showcase these characteristics through an illustration using
simulated data. The simulated dataset has p=10000 regressors, simulated
assuming an uncorrelated design, where only the first 10 are included
in the true data generating model.
```{r}
#Setting up the number of observations, parameters and samples
n <- 10000
p <- 10000
n_iter <- 1000
burnin <- 200
#Creating the data
X <- matrix(rnorm(n*p), nrow = n, ncol = p)
beta0 <- matrix(c(2,-3,2,2,-3,3,-2,3,-2,3, rep(0, p-10)), ncol = 1)
beta <- sqrt(log(p)/n)*beta0
y <- X %*% beta + rnorm(n, mean = 0, sd = 1)
X <- t(t(X) - colMeans(X))
y <- y - mean(y)
#TIMING
microbenchmark::microbenchmark(
scaleBVS = scaleBVS::samplingBVS(y, X, n_iter = n_iter, burn_in = burnin),
times = 10
)
#RESULTS CHECK
head(scaleBVS::samplingBVS(y, X, n_iter = n_iter, burn_in = burnin)$PIP, 11)
```
The previous example is run on a Windows server 64-bit operating system with
4GB of RAM and Intel i5 CPU. With more memory `samplingBVS()` can easily
handle even larger problems.
## References
- Chipman, H. A., George, E. I. and McCulloch, R. E. (2001). The Practical Implementation of Bayesian Model Selection, pp. 65–116. Beachwood: Institute of Mathematical Statistics.
- Zanella, G. and Roberts, G. (2019). Scalable importance tempering and Bayesian variable selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology): 489–517. Crossref. Web.
- Zellner, A. (1986). On Assessing Prior Distributions and Bayesian Regression Analysis with g-Prior Distributions. In: Goel, P. and Zellner, A., Eds., Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti, Elsevier Science Publishers, Inc., New York, 233-243.