humanleague

Introduction

humanleague is a python and an R package for microsynthesising populations from marginal and (optionally) seed data. The package is implemented in C++ for performance.

The package contains algorithms that use a number of different microsynthesis techniques:

Iterative Proportional Fitting (IPF)
Quasirandom Integer Sampling (QIS) (no seed population)
Quasirandom Integer Sampling of IPF (QISI): A combination of the two techniques whereby the integral population is sampled (without replacement) from a distribution constructed from a dynamic IPF solution.

The latter provides a bridge between deterministic reweighting and combinatorial optimisation, offering advantages of both techniques:

generates high-entropy integral populations
can be used to generate multiple populations for sensitivity analysis
goes some way to address the 'empty cells' issues that can occur in straight IPF
relatively fast computation time

The algorithms:

support arbitrary dimensionality* for both the marginals and the seed.
produce statistical data to ascertain the likelihood/degeneracy of the population (where appropriate).

The package also contains the following utility functions:

a Sobol sequence generator
construct a closest integer population from a discrete univariate probability distribution.
an algorithm for sampling an integer population from a discrete multivariate probability distribution, constrained to the marginal sums in every dimension (see below).
'flatten' a multidimensional population into a table: this converts a multidimensional array containing the population count for each state into a table listing individuals and their characteristics.

Version 1.0.1 reflects the work described in the Quasirandom Integer Sampling (QIS) paper.

Installation

Python

Requires Python 3.5 or newer.

PyPI

python3 -m pip install humanleague --user

Anaconda

conda install -c conda-forge humanleague

Build, install and test (from cloned repo)

python setup.py install --user
python setup.py test

R

Official release:

> install.packages("humanleague")

For a development version

> devtools::install_github("virgesmith/humanleague")

Or, for the legacy version

> devtools::install_github("virgesmith/humanleague@1.0.1")

Documentation and Examples

R

Consult the package documentation, e.g.

> library(humanleague)
> ?humanleague

Python

See here, or

>>> import humanleague as hl
>>> help(hl)

Multidimensional integerisation

Building on the prob2IntFreq function - which takes a discrete probability distribution and a count, and returns the closest integer population to the distribution that sums to the count - a multidimensional equivalent integerise is introduced. In one dimension, for example:

>>> import numpy as np
>>> import humanleague
>>> p=np.array([0.1, 0.2, 0.3, 0.4])
>>> humanleague.prob2IntFreq(p, 11)
{'freq': array([1, 2, 3, 5]), 'rmse': 0.3535533905932736}

produces the optimal (i.e. closest possible) integer population to the discrete distribution.

The integerise function generalises this problem and applies it to higher dimensions: given an n-dimensional array of real numbers where the 1-d marginal sums in every dimension are integral (and thus the total population is too), it attempts to find an integral array that also satisfies these constraints.

The QISI algorithm is repurposed to this end. As it is a sampling algorithm it cannot guarantee that a solution is found, and if so, whether the solution is optimal. If it fails this does not prove that a solution does not exist for the given input.

>>> a = np.array([[ 0.3,  1.2,  2. ,  1.5],
                  [ 0.6,  2.4,  4. ,  3. ],
                  [ 1.5,  6. , 10. ,  7.5],
                  [ 0.6,  2.4,  4. ,  3. ]])
# marginal sums
>> sum(a)
array([ 3., 12., 20., 15.])
>>> sum(a.T)
array([ 5., 10., 25., 10.])
# perform integerisation
>>> r = humanleague.integerise(a)
>>> r["conv"]
True
>>> r["result"]
array([[ 0,  2,  2,  1],
       [ 0,  3,  4,  3],
       [ 2,  6, 10,  7],
       [ 1,  1,  4,  4]])
>>> r["rmse"]
0.5766281297335398
# check marginals are preserved
>>> sum(r["result"]) == sum(a)
array([ True,  True,  True,  True])
>>> sum(r["result"].T) == sum(a.T)
array([ True,  True,  True,  True])

Name		Name	Last commit message	Last commit date
Latest commit History 521 Commits
.github/workflows		.github/workflows
.vscode		.vscode
R		R
dev		dev
doc		doc
man		man
scripts		scripts
src		src
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.travis.yml_disabled		.travis.yml_disabled
DESCRIPTION		DESCRIPTION
LICENCE		LICENCE
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
NAMESPACE		NAMESPACE
README.md		README.md
appveyor.yml		appveyor.yml
conda-env.yaml		conda-env.yaml
humanleague.Rproj		humanleague.Rproj
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
setup.py_old		setup.py_old

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

humanleague

Introduction

Installation

Python

PyPI

Anaconda

Build, install and test (from cloned repo)

R

Documentation and Examples

R

Python

Multidimensional integerisation

About

Licenses found

Releases

Packages

Languages

License

Licenses found

kimonkrenz/humanleague

Folders and files

Latest commit

History

Repository files navigation

humanleague

Introduction

Installation

Python

PyPI

Anaconda

Build, install and test (from cloned repo)

R

Documentation and Examples

R

Python

Multidimensional integerisation

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages