The aim of this project is to provide a unified probabilistic programming framework to express different models and techniques from statistics, machine learning and non-parametric Bayes. It serves as the primary modeling and inference runtime system for bayeslite, an open-source implementation of BayesDB.
Composable generative population models (CGPM) are a computational abstraction for probabilistic objects. They provide an interface that explicitly differentiates between the sampler of a random variable from its conditional distribution and the assessor of its conditional density. By encapsulating models as probabilistic programs that implement CGPMs, complex models can be built as compositions of sub-CGPMs, and queried in a model-independent way using the Bayesian Query Language.
The easiest way to install cgpm is to use the package on Anaconda Cloud. Please follow these instructions.
cgpm
targets Ubuntu 14.04 and 16.04. The package can be installed by cloning
this repository and following these instructions. It is highly recommended to
install cgpm
inside of a virtualenv which was created using the
--system-site-packages
flag.
-
Install dependencies from
apt
, listed here. -
Retrieve and build the source.
% git clone git@github.com:probcomp/cgpm % cd cgpm % pip install --no-deps .
-
Verify the installation.
% python -c 'import cgpm' % cd cgpm && ./check.sh
CGPMs, and their integration as a runtime system for BayesDB, are described in the following technical report:
- Probabilistic Data Analysis with Probabilistic Programming. Saad, F., and Mansinghka, V. arXiv preprint, arXiv:1608.05347, 2017.
Applications of using cgpm and bayeslite for data analysis tasks can be further found in:
-
Probabilistic Search for Structured Data via Probabilistic Programming and Nonparametric Bayes. Saad, F. Casarsa, L., and Mansinghka, V. arXiv preprint, arXiv:1704.01087, 2017.
-
Detecting Dependencies in Sparse, Multivariate Databases Using Probabilistic Programming and Non-parametric Bayes. Saad, F., and Mansinghka, V. Artificial Intelligence and Statistics (AISTATS), 2017.
-
A Probabilistic Programming Approach to Probabilistic Data Analysis. Saad, F., and Mansinghka, V. Advances in Neural Information Processing Systems (NIPS), 2016.
Running ./check.sh
will run a subset of the tests that are considered complete
and stable. To launch the full test suite, including continuous integration
tests, run py.test
in the root directory. There are more tests in the tests/
directory, but those that do not start with test_
or do start with disabled_
are not considered ready. The tip of every branch merged into master must
pass ./check.sh
, and be consistent with the code conventions outlined in
HACKING.
To run the full test suite, use ./check.sh --integration tests/
. Note that the
full integration test suite requires installing the C++
crosscat backend.
Copyright (c) 2015-2016 MIT Probabilistic Computing Project
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.