-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathbayespy.tex
428 lines (352 loc) · 18.1 KB
/
bayespy.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
\documentclass[hidelinks,twoside,11pt]{article}
% Any additional packages needed should be included after jmlr2e.
% Note that jmlr2e.sty includes epsfig, amssymb, natbib and graphicx,
% and defines many common macros, such as 'proof' and 'example'.
%
% It also sets the bibliographystyle to plainnat; for more information on
% natbib citation styles, see the natbib documentation, a copy of which
% is archived at http://www.jmlr.org/format/natbib.pdf
\usepackage{jmlr2e}
\usepackage{doi}
\usepackage{subcaption}
\usepackage{url}
\usepackage{color}
\usepackage{listings}
\usepackage{amsmath}
%\usepackage{tikz}
%\usetikzlibrary{bayesnet}
% Definitions of handy macros can go here
% Heading arguments are {volume}{year}{pages}{submitted}{published}{author-full-names}
\jmlrheading{?}{2015}{?-?}{?/??}{?/??}{Jaakko Luttinen}
% Short headings should be running head and authors last names
\ShortHeadings{BayesPy: Variational Bayesian Inference in Python}{Luttinen}
\firstpageno{1}
\lstset{ %
basicstyle=\footnotesize, % the size of the fonts that are used for the code
breakatwhitespace=false, % sets if automatic breaks should only happen at whitespace
breaklines=true, % sets automatic line breaking
captionpos=b, % sets the caption-position to bottom
commentstyle=\color{green}, % comment style
deletekeywords={...}, % if you want to delete keywords from the given language
escapeinside={\%*}{*)}, % if you want to add LaTeX within your code
extendedchars=true, % lets you use non-ASCII characters; for 8-bits encodings only, does not work with UTF-8
% frame=single, % adds a frame around the code
keepspaces=true, % keeps spaces in text, useful for keeping indentation of code (possibly needs columns=flexible)
keywordstyle=\color{blue}, % keyword style
morekeywords={*,...}, % if you want to add more keywords to the set
firstnumber=last,
numbers=left, % where to put the line-numbers; possible values are (none, left, right)
numbersep=10pt, % how far the line-numbers are from the code
numberstyle=\tiny\color{black}, % the style that is used for the line-numbers
rulecolor=\color{black}, % if not set, the frame-color may be changed on line-breaks within not-black text (e.g. comments (green here))
showspaces=false, % show spaces everywhere adding particular underscores; it overrides 'showstringspaces'
showstringspaces=false, % underline spaces within strings only
showtabs=false, % show tabs within strings adding particular underscores
stepnumber=1, % the step between two line-numbers. If it's 1, each line will be numbered
stringstyle=\color{orange}, % string literal style
tabsize=2, % sets default tabsize to 2 spaces
% title=\lstname, % show the filename of files included with \lstinputlisting; also try caption instead of title
belowcaptionskip=1\baselineskip,
breaklines=true,
frame=L,
xleftmargin=1.3\parindent,
showstringspaces=false,
basicstyle=\footnotesize\ttfamily,
% keywordstyle=\bfseries\color{green!40!black},
% commentstyle=\itshape\color{purple!40!black},
% identifierstyle=\color{blue},
% stringstyle=\color{orange},
language=Python
}
\begin{document}
\title{BayesPy: Variational Bayesian Inference in Python}
\author{\name Jaakko Luttinen \email jaakko.luttinen@iki.fi \\
\addr Department of Computer Science\\
Aalto University, Finland}
\editor{?}
\maketitle
\begin{abstract}% <- trailing '%' for backward compatibility of .sty file
BayesPy is an open-source Python software package for performing variational
Bayesian inference. It is based on the variational message passing framework
and supports conjugate exponential family models. By removing the tedious
task of implementing the variational Bayesian update equations, the user can
construct models faster and in a less error-prone way. Simple syntax,
flexible model construction and efficient inference make BayesPy suitable for
both average and expert Bayesian users. It also supports some advanced
methods such as stochastic and collapsed variational inference.
\end{abstract}
\begin{keywords}
variational Bayes, probabilistic programming,
%variational message passing,
Python
\end{keywords}
\section{Introduction}
Bayesian framework provides a theoretically solid and consistent way to
construct models and perform inference. In practice, however, the inference is
usually analytically intractable and is therefore based on approximation methods
such as variational Bayes (VB), expectation propagation (EP) and Markov chain
Monte Carlo (MCMC) \citep{Bishop:2006}. Deriving and implementing the formulas
for an approximation method is often straightforward but tedious, time consuming
and error prone.
BayesPy is a Python 3 package providing tools for constructing conjugate
exponential family models and performing VB inference easily and efficiently.
It is based on the variational message passing (VMP) framework which defines a
simple message passing protocol \citep{Winn:2005}. This enables implementation
of small general nodes that can be used as building blocks for large complex
models. BayesPy offers a comprehensive collection of built-in nodes that can be
used to build a wide range of models and a simple interface for implementing
custom nodes. The package is released under the MIT license.
Several other projects have similar goals for making Bayesian inference easier
and faster to apply. VB inference is available in Bayes Blocks
\citep{Raiko:2007}, VIBES \citep{Bishop:2002} and Infer.NET \citep{Infer.NET}.
Bayes Blocks is an open-source C++/Python package but limited to scalar Gaussian
nodes and a few deterministic functions, thus making it very limited. VIBES is
an old software package for Java, released under the revised BSD license, but it
is no longer actively developed. VIBES has been replaced by Infer.NET, which
%supports a wide range of posterior approximations. However, Infer.NET
is partly closed source and licensed for non-commercial use only. Instead of VB
inference, mainly MCMC is supported by other projects such as PyMC \citep{PyMC},
OpenBUGS \citep{OpenBUGS}, Dimple \citep{Dimple} and Stan \citep{Stan}. Thus,
there is a need for an open-source and maintained VB software package.
\section{Features}
BayesPy can be used to construct conjugate exponential family models. The
documentation provides detailed examples of how to construct a variety of
models, including principal component analysis models, linear state-space
models, mixture models and hidden Markov models. BayesPy has also been used in
two publications about parameter expansion and time-varying dynamics for linear
state-space models \citep{Luttinen:2013,Luttinen:2014}.
% , which show that it is applicable to large complex models requiring efficient
% inference.
Using BayesPy for Bayesian inference consists of four main steps: constructing
the model, providing data, finding the posterior approximation and examining the
results. The user constructs the model from small modular blocks called nodes.
Roughly, each node corresponds to a latent variable, a set of observations or a
deterministic function.
%Some of the nodes may be set as observed by providing
%data.
The inference engine is used to run the message passing algorithm in order to
obtain the posterior approximation. The resulting posterior can be examined,
for instance, by using a few built-in plotting functions or printing the
parameters of the posterior distributions.
Nodes are the primary building blocks for models in BayesPy. There are two
types of nodes: stochastic and deterministic. Stochastic nodes correspond to
probability distributions and deterministic nodes correspond to functions.
Built-in stochastic nodes include all common exponential family distributions
(e.g., Gaussian, gamma and Dirichlet distributions), a general mixture
distribution and a few complex nodes for dynamic variables (e.g., discrete and
Gaussian Markov chains). Built-in deterministic nodes include a gating node and
a general sum-product node.
%The
%documentation also provides instructions for implementing new nodes in case the
%built-in collection is insufficient.
In case a model cannot be constructed using the built-in nodes, the
documentation provides instructions for implementing new nodes.
BayesPy is designed to be simple enough for non-expert users but flexible and
efficient enough for expert users. One goal is to keep the syntax easy and
intuitive to read and write by making it similar to the mathematical formulation
of the model. Missing values are easy to handle and the variables can be
monitored by plotting the posterior distributions during the learning. BayesPy
has also preliminary support for some advanced VB methods such as stochastic
variational inference \citep{Hoffman:2013}, deterministic annealing
\citep{Katahira:2008}, collapsed inference \citep{Hensman:2012}, Riemannian
conjugate gradient learning \citep{Honkela:2010}, parameter expansions
\citep{Qi:2007} and pattern searches \citep{Honkela:2003}. For developers, the
unit testing framework helps in finding bugs, making changes and implementing
new features in a robust manner.
BayesPy can be installed similarly to other Python packages. It requires Python
3 and a few popular packages: NumPy, SciPy, matplotlib and h5py. The latest
release can be installed from Python Package Index (PyPI) and detailed
instructions can be found from the comprehensive online
documentation\footnote{\url{http://bayespy.org}}. The latest development
version is available at
GitHub\footnote{\url{https://github.com/bayespy/bayespy}}, which is also the
platform used for reporting bugs and making pull requests.
%- numpy, scipy, matplotlib, h5py
%- documentation
%- installation
%- repository
%- clear syntax
%- unit tests
%- parameter expansion
%- monitoring
%- missing values
%- examples: PCA, LSSM, HMM, mixture models
%- used in two publications, \citep{Luttinen:2013, Luttinen:2014}
\section{Example}
This section demonstrates the key steps in using BayesPy. An artificial
Gaussian mixture dataset is created by drawing 500 samples from two
2-dimensional Gaussian distributions. 200 samples have mean $[2, 2]$ and 300
samples have mean $[0, 0]$:
\begin{lstlisting}
import numpy as np
N = 500; D = 2
data = np.random.randn(N, D)
data[:200,:] += 2*np.ones(D)
\end{lstlisting}
We construct a mixture model for the data and assume that the parameters, the
cluster assignments and the true number of clusters are unknown. The model uses
a maximum number of five clusters but the effective number of clusters will be
determined automatically:
\begin{lstlisting}
K = 5
\end{lstlisting}
The unknown cluster means and precision matrices are given Gaussian and Wishart
prior distributions:
\begin{lstlisting}
from bayespy import nodes
mu = nodes.Gaussian(np.zeros(D), 0.01*np.identity(D), plates=(K,))
Lambda = nodes.Wishart(D, D*np.identity(D), plates=(K,))
\end{lstlisting}
The \texttt{plates} keyword argument is used to define repetitions similarly to
the plate notation in graphical models. The cluster assignments are categorical
variables, and the cluster probabilities are given a Dirichlet prior
distribution:
\begin{lstlisting}
alpha = nodes.Dirichlet(0.01*np.ones(K))
z = nodes.Categorical(alpha, plates=(N,))
\end{lstlisting}
The observations are from a Gaussian mixture distribution:
\begin{lstlisting}
y = nodes.Mixture(z, nodes.Gaussian, mu, Lambda)
\end{lstlisting}
The second argument for the \texttt{Mixture} node defines the type of the
mixture distribution, in this case Gaussian. The variable is marked as observed
by providing the data:
\begin{lstlisting}
y.observe(data)
\end{lstlisting}
Next, we want to find the posterior approximation for our latent variables. We
create the variational Bayesian inference engine:
\begin{lstlisting}
from bayespy.inference import VB
Q = VB(y, mu, z, Lambda, alpha)
\end{lstlisting}
Before running the VMP algorithm, the symmetry in the model is broken by
a random initialization of the cluster assignments:
\begin{lstlisting}
z.initialize_from_random()
\end{lstlisting}
Without the random initialization, the clusters would not be separated. The VMP
algorithm updates the variables in turns and is run for 200 iterations or until
convergence:
\begin{lstlisting}
Q.update(repeat=200)
\end{lstlisting}
The results can be examined visually by using \texttt{bayespy.plot} module:
\begin{lstlisting}
import bayespy.plot as bpplt
bpplt.gaussian_mixture_2d(y, alpha=alpha)
bpplt.pyplot.show()
\end{lstlisting}
It is also possible to print the parameters of the approximate posterior
distributions:
\begin{lstlisting}
print(alpha)
\end{lstlisting}
The \texttt{bayespy.plot} module contains also other functions for visually
examining the distributions.
%In addition, module \texttt{bayespy.plot} contains several plotting functions to
%examine the results visually.
\section{Comparison with Infer.NET}
This section provides a brief comparison with Infer.NET because it is another
active project implementing the variational message passing algorithm, whereas
the other related active projects implement MCMC. The main advantages of
Infer.NET over BayesPy are its support for non-conjugate models (e.g., logistic
regression) and other inference engines (EP and Gibbs sampling). On the other
hand, BayesPy is open-source software and supports several advanced VB methods
(e.g., stochastic variational inference and collapsed inference).
The speed of the packages were compared by using two widely used models: a
Gaussian mixture model (GMM) and principal component analysis
(PCA).\footnote{The scripts for running the experiments are available as
supplementary material.} Both models were run for small and large artificial
datasets. For GMM, the small model used 10 clusters for 200 observations with 2
dimensions, and the large model used 40 clusters for 2000 observations with 10
dimensions. For PCA, the small model used 10-dimensional latent space for 500
observations with 20 dimensions, and the large model used 40-dimensional latent
space for 2000 observations with 100 dimensions. For the PCA model, Infer.NET
used both a fully factorizing approximation and also the same approximation as
BayesPy which did not factorize with respect to the latent space dimensions.
%Infer.NET used a fully factorizing approximation for the PCA model,
%whereas BayesPy did not factorize with respect to the latent space dimensions.
The experiments were run on a quad-core (i7-4702MQ) Linux computer.
% with a quad-core processor (i7-4702MQ).
Because the packages run practically identical algorithms and thus
converged to similar solutions in a similar number of iterations (50--100
iterations depending on the dataset), we compared the average CPU time per
iteration.
The results are summarized in Table~\ref{tab:speed}. For all datasets, BayesPy
is faster probably because it uses highly optimized numerical libraries (BLAS
and LAPACK). For PCA, BayesPy also automatically avoids redundant computations
which arise because latent variables have equal posterior covariance matrices.
% This broadcasting makes BayesPy significantly faster than Infer.NET.
However, such broadcasting is not always possible (e.g., if there are missing
values in the PCA data). Thus, the table also presents the results when BayesPy
is forced to not use broadcasting: BayesPy is slower than Infer.NET on the
smaller PCA dataset but faster on the larger PCA dataset.
% Note that Infer.NET uses a fully factorized approximation.
If Infer.NET used the same factorization for PCA as BayesPy, Infer.NET may be
orders of magnitude slower.
% On the smaller datasets,
% Infer.NET is slightly faster because BayesPy has some Python overhead. However,
% the total CPU time for both methods is only a few seconds. On the larger
% datasets, BayesPy is significantly faster because it uses BLAS and LAPACK
% libraries for fast numerical operations.\footnote{Assuming that NumPy uses
% BLAS/LAPACK because BayesPy uses NumPy for numerical operations.} In
% addition, if using multi-threaded BLAS and LAPACK on a multi-core computer, the
% difference in wall-clock time can be even larger than in CPU time.
%
%The speed of BayesPy for both models might be improved by using collapsed or
%stochastic variational inference. The accuracy of BayesPy could be improved by
%using a Gaussian-Wishart node, which does not factorize with respect to the
%Gaussian and Wishart variables.
\begin{table}[tb]
\centering
\small
\begin{tabular}{ccccc}
&
Small GMM
&
Large GMM
&
Small PCA
&
Large PCA
\\
\hline
BayesPy & 6 & 90 & 7 (60) & 400 (1\,500)
\\
Infer.NET & 25 & 4\,600 & 37 (350) & 2\,200 (210\,000)
\end{tabular}
\caption{
The average CPU time in milliseconds per iteration for each
dataset. The results in parentheses have the following meanings: a) BayesPy without using
broadcasting. b) Infer.NET using the same factorization as
BayesPy.
}
\label{tab:speed}
\end{table}
\section{Conclusions}
BayesPy provides a simple and efficient way to construct conjugate exponential
family models and to find the variational Bayesian posterior approximation in
Python.
%
In addition to the standard variational message passing, it supports several
advanced methods such as stochastic and collapsed variational inference. Future
plans include support for non-conjugate models and non-parametric models (e.g.,
Gaussian and Dirichlet processes).
% Future plans for BayesPy include implementing more inference engines (e.g.,
% maximum likelihood, expectation propagation and Gibbs sampling), improving the
% VB engine (e.g., collapsed variational inference \citep{Hensman:2012} and
% Riemannian conjugate gradient method \citep{Honkela:2010}) and implementing nodes
% for non-parametric models (e.g., Gaussian processes and Dirichlet processes).
% Acknowledgements should go at the end, before appendices and references
%\acks{?}
\vskip 0.2in
\small
\bibliography{bibliography/bibliography.bib}
\end{document}
%%% Local Variables:
%%% mode: latex
%%% TeX-PDF-mode: t
%%% TeX-master: t
%%% End: