Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: Parameter Distributions #88

Merged
merged 14 commits into from
Dec 7, 2020
Merged

Conversation

odunbar
Copy link
Collaborator

@odunbar odunbar commented Nov 25, 2020

Purpose

Construct a flexible object to store parameter distributions. We wish for it to work with both given distributions, or simply a collection of samples of a distribution, we also wish for it to include transformations between constrained an unconstrained spaces. The vision is this will hold both prior and posterior distributions.

Contained in the PR

In the PR we will build a ParameterDistributions object, with useful auxilliary functions to sample the distributions within, and some appropriate transformations mapping between constrained space (corresponding to boundary conditions of parameters) into an unbounded space (where priors are placed). We also provide a suite of unit tests for the objects and functions.

The functionality will be integrated into CES in the next PR.

The Primary Structure

A ParameterDistribution is initialized with 3 inputs.

  1. Single or array of ParameterDistributionType derived objects in the "unconstrained space" (currently Samples (initialized with an array) or Parameterized (initialized with a Julia Distributions object) )
  2. Single of array of Array{ConstraintType} derived object Constraint with (ease-of-use constructors no_constraint, bounded_below, bounded_above, or bounded) with array size to match the distribution parameter space dimension
  3. A name String

Other functions

If applied to ParameterDistributions typically these will return Dict of with ParameterDistribution.name as keys or an Array

  • get_name: returns the names
  • get_distribution: returns the Julia Distribution object, if it is Parameterized
  • sample_distribution: samples the Julia Distribution if Parameterized, or draws from the list of samples if Samples
  • transform_constrained_to_unconstrained: identity map if no_constraint, log-transform if bounded_below or bounded_above, logit-transform if bounded, otherwise use the maps provided.
  • transform_unconstrained_to_constrained: The inverses for the above mappings

An example from the runtests:

We create a 6D parameter distribution from 2 parameters. The first parameter is a 4D distribution with the following constraints in real space:

c1 = [no_constraint(),
      bounded_below(-1.0), # provide lower bound
      bounded_above(0.4), # provide upper bound
      bounded(-0.1,0.2)] # provide lower and upper bound

We choose to use a multivariate normal to represent it's distribution in the transformed (unbounded) space. In fact we take each dimension as independent (so this could also be represented as 4 separate univariate distributions in this exact case).

d1 = Parameterized(MvNormal(4,0.1)) # 4D multivariate normal with 0.1^2 I covariance

We provide a name

name1 = "constrained_mvnormal"

The second parameter distribution is a 2D one. It is only given by 4 samples in the transformed space - (where one will typically generate samples). It is bounded in the first dimension by the constraint shown, there is a user provided transform for the second dimension - using the default constructor.

d2 = Samples([1.0 3.0; 5.0 7.0; 9.0 11.0; 13.0 15.0]) # 4 samples of 2D parameter space
transform = (x -> 3*x + 14)
inverse_transform = (x -> (x-14) / 3)
c2 = [bounded(10,15),
      Constraint(transform, inverse_transform)]
name2 = "constrained_sampled"
u = ParameterDistribution([d1,d2],[c1,c2],[name1,name2])

@codecov
Copy link

codecov bot commented Nov 25, 2020

Codecov Report

Merging #88 (cc07d5e) into master (33742ea) will increase coverage by 1.96%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #88      +/-   ##
==========================================
+ Coverage   77.84%   79.81%   +1.96%     
==========================================
  Files           7        8       +1     
  Lines         492      540      +48     
==========================================
+ Hits          383      431      +48     
  Misses        109      109              
Impacted Files Coverage Δ
src/CalibrateEmulateSample.jl 100.00% <ø> (ø)
src/ParameterDistribution.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 33742ea...cc07d5e. Read the comment docs.

src/ParameterDistributions.jl Outdated Show resolved Hide resolved


#apply transforms

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the terms real <-> prior could be confusing. For example, I may think of the prior distribution as also being in the real space, since this is the space where my prior belief is based on. What do you think about transform_bounded_to_unbounded and transform_unbounded_to_bounded?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah i mulled over this for a while actually. The terminology came from these 2 perspectives
(i) The modelling assumes we are based in the Real space. e.g we think of the parameters are y in Real and F: Real -> Data is the forward model. The Bayesian methodology we developed works only on an unbounded prior space, and so from the mathematical stance we have a single map G: Prior -> Data. If we use a map G we will need first need a transformation T: Real -> Prior, then the action is F(y) = G(T(y)).

(ii) The other perspective is that the mathematical stance is G: Prior -> Data, and so we define a map S:Prior -> Real and then we apply the model F, so G(x) = F(S(x)).

From perspective (i) the prior is a e.g lognormal distribution and your forward map is the model, but in order to work with it you need to work in a computational space. From (ii) the prior is a normal distribution, and the forward model maps this to the data (the black box workings don't matter), I preferred the latter because the theory is based on this Prior space and not on the Real space, and this is actually how our code works.

So coming to the naming, I don't like mentioning bounded because this is only 1 of the 4 cases. But you are right the one key purpose is map to an unbounded space, i do like that; but then is it clear where your prior distribution is defined?

Copy link
Contributor

@ilopezgp ilopezgp Nov 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, let's see what the others think. I think if one of the two includes unbounded or unconstrained, there should be no confusion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the naming of transform_real_to_prior is a bit confusing. As long as it is explicitly documented in comments and docs it could be fine, since on the surface level a user would probably assume the distribution they are providing is the one in the Prior space rather than the Real space.

src/ParameterDistributions.jl Outdated Show resolved Hide resolved
@odunbar odunbar changed the title [WIP] Refactor: Parameter Distributions Refactor: Parameter Distributions Dec 2, 2020
@odunbar
Copy link
Collaborator Author

odunbar commented Dec 7, 2020

Thanks all! I'll merge this in now

@odunbar
Copy link
Collaborator Author

odunbar commented Dec 7, 2020

bors r+

@bors
Copy link
Contributor

bors bot commented Dec 7, 2020

Build succeeded:

@bors bors bot merged commit 76562c6 into master Dec 7, 2020
@bors bors bot deleted the orad/parameter-distributions-v2 branch December 7, 2020 22:56
bors bot added a commit that referenced this pull request Dec 15, 2020
89: WIP: replace Prior and posterior samples with ParameterDistributions r=odunbar a=odunbar

# Purpose
 
To follow from PR #88 in replacing the prior distributions and posterior distributions with the new type ParameterDistributions, and adding the requisite functionality to make this possible.

## Contained in the PR

- Implement methods: `get_logpdf`, `get_cov`,`get_mean` and replace implementation in EKP, and MCMC. Note this will also allow us to use prior distributions with block diagonal (i.e not only diagonal) in the MCMC. 
- Add requisite unit tests 
- Modify `runtests.jl`  that are dependent on `Priors.jl`, to instead use ParameterDistributions
- Remove Priors.jl

**Future PR will deal with example cases (not contained in runtests)**

## Additionally
- [x]  Created the following issue: When creating EKS, before one supplied mean and cov separately, these can now be deduced from the prior (which is also an input).


Co-authored-by: odunbar <odunbar@caltech.edu>
bors bot added a commit that referenced this pull request Dec 22, 2020
94: Update examples to work with the latest CES code r=bielim a=bielim

The goal of this PR is to get all examples synced up with the latest changes in the code base (in particular, PRs  #88 and #89)

- [x]  `Cloudy_example.jl`
- [x]  `learn_noise.jl`
- [x] `plot_GP.jl`

In addition, `get_distribution()` (in `ParameterDistribution.jl`) has been modified to return the array of samples when called for `Samples` ( rather than the message "Contains samples only"). `get_distribution` now returns a `Dict` with the parameter names as keys and the corresponding distribution (in the case of `Parameterized` distributions, such as Normal(0.0, 1.0)) or the corresponding samples (in the case of parameters represented by `Samples`) as a parameter_dimension x n_samples array.

Co-authored-by: Melanie <melanie@charney.bieli.email>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants