PrivateMultiplicativeWeights.jl

This package implements MWEM, a simple and practical algorithm for differentially private data release.

MIT Licensed. See LICENSE.md.

Installation

Open a Julia prompt and call:

Pkg.clone("https://github.com/mrtzh/PrivateMultiplicativeWeights.jl.git")

Main Features

Differentially private synthetic data preserving lower order marginals of an input data set
Optimized in-memory implementation for small number of data attributes
Scalable heuristic for large number of data attributes
Easy-to-use interfaces for custom query sets and data representations

Examples

Histogram approximations

Check out histograms.ipynb for details on how to use the algorithm to compute differentially private histogram approximations.

Marginal approximations

The package can also be used to create synthetic data that approximates the lower order marginals of a data set with binary features. For the sake of illustration, we create a random data set with hidden correlations. Columns correspond to data points.

d, n = 20, 1000
data_matrix = rand(0:1, d ,n)
data_matrix[3, :] = data_matrix[1, :] .* data_matrix[2, :]

We can run MWEM to produce synthetic data accurate for 1st, 2nd, 3rd order marginals of the source data.

using PrivateMultiplicativeWeights
mw = mwem(Parities(d, 3), Tabular(data_matrix))

This will convert the data to its explicit histogram representation of size 2^d and may not be useful when d is large. See section on factored histograms for an alternative when the dimension d is large.

Convert histograms to matrices

We can convert synthetic data in histogram representation to a tabular (matrix) representation.

table = Tabular(mw.synthetic, n)

Compute error of approximation

Compute error achieved by MWEM:

maximum_error(mw), mean_squared_error(mw)

Note that these statistics are not differentially private.

Parameters

Parameters can be set flexibly with the MWParameters constructor:

mw = mwem(Parities(d, 3),
          Tabular(data_matrix),
          MWParameters(epsilon=1.0,
                       iterations=10,
                       repetitions=10,
                       verbose=false,
                       noisy_init=false))

Available parameters:

Name	Default	Description
`epsilon`	`1.0`	Privacy parameter per iteration. Each iteration of MWEM is `epsilon`-differentially private. Total privacy guarantees follow via composition theorems.
`iterations`	`10`	Number of iterations of MWEM. Each iteration corresponds to selecting one query via the exponential mechanism, evaluating the query on the data, and updating the internal state.
`repetitions`	`10`	Number of times MWEM cycles through previously measured queries per iteration. This has no additional privacy cost.
`noisy_init`	`false`	Histogram initalization through noise addition. This incurs an additional `epsilon` privacy cost. When `noisy_init` is set to false, the initialization is uniform.
`verbose`	`false`	print timing and error statistics per iteration (information is not differentially private)

The function MWParameters accepts any subset of parameters, e.g., MWParameter(epsilon=0.5, iterations=5).

Data representations

Histogram representation

By default, MWEM works with the histogram representation of a data sets. This means that the data is represented by a vector whose length is equal to the size of domain. For example, data consisting of d binary attributes would be converted to an array of length 2^d. MWEM needs to store and array of this length in main memory, which is often the computational bottleneck.

Factored histograms

When the histogram representation is too large, try using factored histograms. Factored histograms maintain a product distribution over clusters of attributes of the data. Each component is represented using a single histogram. Components are merged as it becomes necessary. This often allows to scale up MWEM by orders of magnitude.

d, n = 100, 1000
data_matrix = rand(0:1, d, n)
data_matrix[3, :] = data_matrix[1, :] .* data_matrix[2, :]
mw = mwem(FactorParities(d, 3), Tabular(data_matrix))

Also see examples.jl.

Query representations

There are two ways to define custom query sets.

Histogram queries

Histogram queries are linear functions in the histogram representation of the data. You can define custom query workloads by using HistogramQueries(query_matrix) instead of Parities(d, 3). Here query matrix is an N x k matrix specifying the query set in its Histogram representation, N is the histogram length and k is the k is the number of queries.

Custom query types

To build query sets with your own implicit representations, sub-type Query and Queries. Implement the functions specified in src/interface.jl.

See src/parities.jl for an example.

Available query sets

Parities(d, k)

Parities of k out of d attributes. This corresponds to approximating k-way marginals of the original data.
FactorParities(d, k)

Parities of k out of d attributes for factored histogram representation.
RangeQueries(N)

Range queries corresponding to all interval queries over a histogram of length N.

Contributing to this package

There are many ways to contribute to this repository:

Experiments
Additional query sets (e.g., two-dimensional range queries)
Additional tests, debugging, optimization
Additional documentation

Citing this package

The MWEM algorithm was presented in the following paper:

@inproceedings{HLM12,
  author = "Moritz Hardt and Katrina Ligett and Frank McSherry",
  title = "A simple and practical algorithm for differentially-private data release",
  booktitle = {Proc.\ $26$th Neural Information Processing Systems (NIPS)},
  year = {2012},
}

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
examples		examples
src		src
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.md		LICENSE.md
README.md		README.md
REQUIRE		REQUIRE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PrivateMultiplicativeWeights.jl

Installation

Main Features

Examples

Histogram approximations

Marginal approximations

Convert histograms to matrices

Compute error of approximation

Parameters

Data representations

Histogram representation

Factored histograms

Query representations

Histogram queries

Custom query types

Available query sets

Contributing to this package

Citing this package

Status

About

Releases

Packages

Languages

License

dbrakman/PrivateMultiplicativeWeights.jl

Folders and files

Latest commit

History

Repository files navigation

PrivateMultiplicativeWeights.jl

Installation

Main Features

Examples

Histogram approximations

Marginal approximations

Convert histograms to matrices

Compute error of approximation

Parameters

Data representations

Histogram representation

Factored histograms

Query representations

Histogram queries

Custom query types

Available query sets

Contributing to this package

Citing this package

Status

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages