Refactor: Parameter Distributions #88

odunbar · 2020-11-25T02:34:53Z

Purpose

Construct a flexible object to store parameter distributions. We wish for it to work with both given distributions, or simply a collection of samples of a distribution, we also wish for it to include transformations between constrained an unconstrained spaces. The vision is this will hold both prior and posterior distributions.

Contained in the PR

In the PR we will build a ParameterDistributions object, with useful auxilliary functions to sample the distributions within, and some appropriate transformations mapping between constrained space (corresponding to boundary conditions of parameters) into an unbounded space (where priors are placed). We also provide a suite of unit tests for the objects and functions.

The functionality will be integrated into CES in the next PR.

The Primary Structure

A ParameterDistribution is initialized with 3 inputs.

Single or array of ParameterDistributionType derived objects in the "unconstrained space" (currently Samples (initialized with an array) or Parameterized (initialized with a Julia Distributions object) )
Single of array of Array{ConstraintType} derived object Constraint with (ease-of-use constructors no_constraint, bounded_below, bounded_above, or bounded) with array size to match the distribution parameter space dimension
A name String

Other functions

If applied to ParameterDistributions typically these will return Dict of with ParameterDistribution.name as keys or an Array

get_name: returns the names
get_distribution: returns the Julia Distribution object, if it is Parameterized
sample_distribution: samples the Julia Distribution if Parameterized, or draws from the list of samples if Samples
transform_constrained_to_unconstrained: identity map if no_constraint, log-transform if bounded_below or bounded_above, logit-transform if bounded, otherwise use the maps provided.
transform_unconstrained_to_constrained: The inverses for the above mappings

An example from the runtests:

We create a 6D parameter distribution from 2 parameters. The first parameter is a 4D distribution with the following constraints in real space:

c1 = [no_constraint(),
      bounded_below(-1.0), # provide lower bound
      bounded_above(0.4), # provide upper bound
      bounded(-0.1,0.2)] # provide lower and upper bound

We choose to use a multivariate normal to represent it's distribution in the transformed (unbounded) space. In fact we take each dimension as independent (so this could also be represented as 4 separate univariate distributions in this exact case).

d1 = Parameterized(MvNormal(4,0.1)) # 4D multivariate normal with 0.1^2 I covariance

We provide a name

name1 = "constrained_mvnormal"

The second parameter distribution is a 2D one. It is only given by 4 samples in the transformed space - (where one will typically generate samples). It is bounded in the first dimension by the constraint shown, there is a user provided transform for the second dimension - using the default constructor.

d2 = Samples([1.0 3.0; 5.0 7.0; 9.0 11.0; 13.0 15.0]) # 4 samples of 2D parameter space
transform = (x -> 3*x + 14)
inverse_transform = (x -> (x-14) / 3)
c2 = [bounded(10,15),
      Constraint(transform, inverse_transform)]
name2 = "constrained_sampled"
u = ParameterDistribution([d1,d2],[c1,c2],[name1,name2])

codecov · 2020-11-25T02:41:58Z

Codecov Report

Merging #88 (cc07d5e) into master (33742ea) will increase coverage by 1.96%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #88      +/-   ##
==========================================
+ Coverage   77.84%   79.81%   +1.96%     
==========================================
  Files           7        8       +1     
  Lines         492      540      +48     
==========================================
+ Hits          383      431      +48     
  Misses        109      109

Impacted Files	Coverage Δ
src/CalibrateEmulateSample.jl	`100.00% <ø> (ø)`
src/ParameterDistribution.jl	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 33742ea...cc07d5e. Read the comment docs.

src/ParameterDistributions.jl

ilopezgp · 2020-11-27T17:51:08Z

src/ParameterDistributions.jl

+
+
+#apply transforms
+


I find the terms real <-> prior could be confusing. For example, I may think of the prior distribution as also being in the real space, since this is the space where my prior belief is based on. What do you think about transform_bounded_to_unbounded and transform_unbounded_to_bounded?

Yeah i mulled over this for a while actually. The terminology came from these 2 perspectives
(i) The modelling assumes we are based in the Real space. e.g we think of the parameters are y in Real and F: Real -> Data is the forward model. The Bayesian methodology we developed works only on an unbounded prior space, and so from the mathematical stance we have a single map G: Prior -> Data. If we use a map G we will need first need a transformation T: Real -> Prior, then the action is F(y) = G(T(y)).

(ii) The other perspective is that the mathematical stance is G: Prior -> Data, and so we define a map S:Prior -> Real and then we apply the model F, so G(x) = F(S(x)).

From perspective (i) the prior is a e.g lognormal distribution and your forward map is the model, but in order to work with it you need to work in a computational space. From (ii) the prior is a normal distribution, and the forward model maps this to the data (the black box workings don't matter), I preferred the latter because the theory is based on this Prior space and not on the Real space, and this is actually how our code works.

So coming to the naming, I don't like mentioning bounded because this is only 1 of the 4 cases. But you are right the one key purpose is map to an unbounded space, i do like that; but then is it clear where your prior distribution is defined?

As discussed, let's see what the others think. I think if one of the two includes unbounded or unconstrained, there should be no confusion.

I agree the naming of transform_real_to_prior is a bit confusing. As long as it is explicitly documented in comments and docs it could be fine, since on the surface level a user would probably assume the distribution they are providing is the one in the Prior space rather than the Real space.

src/ParameterDistributions.jl

odunbar · 2020-12-07T22:43:04Z

Thanks all! I'll merge this in now

odunbar · 2020-12-07T22:43:11Z

bors r+

bors · 2020-12-07T22:56:39Z

Build succeeded:

89: WIP: replace Prior and posterior samples with ParameterDistributions r=odunbar a=odunbar # Purpose To follow from PR #88 in replacing the prior distributions and posterior distributions with the new type ParameterDistributions, and adding the requisite functionality to make this possible. ## Contained in the PR - Implement methods: `get_logpdf`, `get_cov`,`get_mean` and replace implementation in EKP, and MCMC. Note this will also allow us to use prior distributions with block diagonal (i.e not only diagonal) in the MCMC. - Add requisite unit tests - Modify `runtests.jl` that are dependent on `Priors.jl`, to instead use ParameterDistributions - Remove Priors.jl **Future PR will deal with example cases (not contained in runtests)** ## Additionally - [x] Created the following issue: When creating EKS, before one supplied mean and cov separately, these can now be deduced from the prior (which is also an input). Co-authored-by: odunbar <odunbar@caltech.edu>

94: Update examples to work with the latest CES code r=bielim a=bielim The goal of this PR is to get all examples synced up with the latest changes in the code base (in particular, PRs #88 and #89) - [x] `Cloudy_example.jl` - [x] `learn_noise.jl` - [x] `plot_GP.jl` In addition, `get_distribution()` (in `ParameterDistribution.jl`) has been modified to return the array of samples when called for `Samples` ( rather than the message "Contains samples only"). `get_distribution` now returns a `Dict` with the parameter names as keys and the corresponding distribution (in the case of `Parameterized` distributions, such as Normal(0.0, 1.0)) or the corresponding samples (in the case of parameters represented by `Samples`) as a parameter_dimension x n_samples array. Co-authored-by: Melanie <melanie@charney.bieli.email>

initial structure of Parameter Distributions, and their constraints

b2dc519

odunbar added 7 commits November 25, 2020 11:43

properly split parameters in transforms

2c3d420

docstrings

6b85f3a

added Unit tests to objects and functions

828ef55

added StatsBase

10a1614

include StatsBase

efef03f

ensured uniqueness of StatsBase sample function

16e6406

runtests for transforms

04ffee5

odunbar commented Nov 26, 2020

View reviewed changes

src/ParameterDistributions.jl Outdated Show resolved Hide resolved

odunbar added 2 commits November 26, 2020 22:39

remove Revise from Project toml

8775299

more definition on storage of parameters in columns

5e324de

odunbar requested review from jakebolewski and ilopezgp November 27, 2020 01:26

ilopezgp reviewed Nov 27, 2020

View reviewed changes

odunbar added 3 commits November 28, 2020 02:57

Removed ParameterDistributions and renamed files

f1c4bfe

renamed test

9a6d7e5

changed transform names

029ff80

odunbar changed the title ~~[WIP] Refactor: Parameter Distributions~~ Refactor: Parameter Distributions Dec 2, 2020

Constraints store only the transform function now, not the bounds

cc07d5e

bors bot merged commit 76562c6 into master Dec 7, 2020

bors bot deleted the orad/parameter-distributions-v2 branch December 7, 2020 22:56

odunbar mentioned this pull request Dec 8, 2020

WIP: replace Prior and posterior samples with ParameterDistributions #89

Merged

1 task

bielim mentioned this pull request Dec 18, 2020

[WIP] Update examples to work with latest CES code #92

Closed

3 tasks

bielim mentioned this pull request Dec 22, 2020

Update examples to work with the latest CES code #94

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: Parameter Distributions #88

Refactor: Parameter Distributions #88

odunbar commented Nov 25, 2020 •

edited

Loading

codecov bot commented Nov 25, 2020 •

edited

Loading

ilopezgp Nov 27, 2020

odunbar Nov 27, 2020

ilopezgp Nov 27, 2020 •

edited

Loading

mhowlan3 Nov 30, 2020

odunbar commented Dec 7, 2020

odunbar commented Dec 7, 2020

bors bot commented Dec 7, 2020



		#apply transforms

Refactor: Parameter Distributions #88

Refactor: Parameter Distributions #88

Conversation

odunbar commented Nov 25, 2020 • edited Loading

Purpose

Contained in the PR

The Primary Structure

Other functions

An example from the runtests:

codecov bot commented Nov 25, 2020 • edited Loading

Codecov Report

ilopezgp Nov 27, 2020

Choose a reason for hiding this comment

odunbar Nov 27, 2020

Choose a reason for hiding this comment

ilopezgp Nov 27, 2020 • edited Loading

Choose a reason for hiding this comment

mhowlan3 Nov 30, 2020

Choose a reason for hiding this comment

odunbar commented Dec 7, 2020

odunbar commented Dec 7, 2020

bors bot commented Dec 7, 2020

odunbar commented Nov 25, 2020 •

edited

Loading

codecov bot commented Nov 25, 2020 •

edited

Loading

ilopezgp Nov 27, 2020 •

edited

Loading