Modular emulator interface #120

odunbar · 2021-11-24T22:08:03Z

Purpose

Reduces the current Emulator interface dependence of Gaussian Processes. Now the GP can be swapped for another statistical emulator.

In the PR

New general Emulator class. This handles all the data manipulation e.g. Normalization, Standardization, Decorrelation
General interface functions for Emulator, optimize_hyperparameters!, predict
New MachineLearningTool type,
Moved the Gaussian Processes into a GaussianProcess <: MachineLearningTool class
Example (e.g. plot_GP) to demonstrate the new interface
Unit tests
New doc strings.

Additional change

Seems to be ongoing issues with unit testing Julia 1.5.4, so I have updated the Manifest, Docs.yml and Test.yml to Julia 1.6.X

Changes to user experience:

Ingredients:

gppackage = GPJL()
pred_type = YType()
GPkernel = ...
iopairs = PairedDataContainer(x_data,y_data)

Old interface

Set up a GaussianProcessEmulator object

    gp = GaussianProcess(
         iopairs,
         gppackage;
         GPkernel=GPkernel, 
         obs_noise_cov=nothing, 
         normalized=false, 
         noise_learn=true, 
	 truncate_svd=1.0, 
         standardize=false,
         prediction_type=pred_type, 
         norm_factor=nothing)

Then predict with it.

μ, σ² = GaussianProcessEmulator.predict(gp, new_inputs)

It is short, but it inherently is stuck to the Gaussian process framework. It also hides e.g. the training away, and we may wish to have this more open. The script below is more general, separating out which parameters are related to data processing and which relate to the specific ML tool.

New interface

Setup a GaussianProcess<:MachineLearningTool object

 gp = GaussianProcess(
       gppackage;
       kernel=GPkernel,
       noise_learn=true,
       prediction_type=pred_type)

and then create the general emulator type using gp

    em = Emulator(
        gp,
        iopairs,
        obs_noise_cov=nothing,
        normalize_inputs=false,
        standardize_outputs=false,
        truncate_svd=1.0)

Train and predict

Emulators.optimize_hyperparameters!(em)
μ, σ² = Emulators.predict(em, new_inputs)

Adding a new `MachineLearningTool`

Include a new file NewTool.jl at the top of Emulator.jl
In this file define:

struct NewTool <: MachineLearningTool with constructor NewTool(...) to hold ML parameters and models
function build_models!(NewTool,iopairs) to build and store ML models. Called in Emulator constructor
function optimize_hyperparameters!(NewTool)to train the stored ML models. Called by method of same name in Emulator
function predict(NewTool,new_inputs) to predict with stored ML models Called by method of same name in Emulator

codecov · 2021-12-23T14:19:33Z

Codecov Report

Merging #120 (61178bd) into master (8723d4a) will increase coverage by 1.40%.
The diff coverage is 92.53%.

@@            Coverage Diff             @@
##           master     #120      +/-   ##
==========================================
+ Coverage   90.41%   91.81%   +1.40%     
==========================================
  Files           4        4              
  Lines         386      391       +5     
==========================================
+ Hits          349      359      +10     
+ Misses         37       32       -5

Impacted Files	Coverage Δ
src/MarkovChainMonteCarlo.jl	`86.99% <80.00%> (-2.44%)`	⬇️
src/GaussianProcess.jl	`91.74% <91.74%> (ø)`
src/Emulator.jl	`94.26% <94.26%> (ø)`
src/CalibrateEmulateSample.jl

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8723d4a...61178bd. Read the comment docs.

bischtob

Looks good to me.

odunbar · 2022-01-05T13:13:29Z

bors r+

120: Modular emulator interface r=odunbar a=odunbar ## Purpose Reduces the current Emulator interface dependence of Gaussian Processes. Now the GP can be swapped for another statistical emulator. ## In the PR - [x] New general `Emulator` class. This handles all the data manipulation e.g. Normalization, Standardization, Decorrelation - [x] General interface functions for Emulator, `optimize_hyperparameters!`, `predict` - [x] New `MachineLearningTool` type, - [x] Moved the Gaussian Processes into a `GaussianProcess <: MachineLearningTool` class - [x] Example (e.g. `plot_GP`) to demonstrate the new interface - [x] Unit tests - [x] New doc strings. ## Additional change Seems to be ongoing issues with unit testing Julia 1.5.4, so I have updated the Manifest, Docs.yml and Test.yml to Julia 1.6.X ## Changes to user experience: Ingredients: ```julia gppackage = GPJL() pred_type = YType() GPkernel = ... iopairs = PairedDataContainer(x_data,y_data) ``` ### Old interface Set up a `GaussianProcessEmulator` object ```julia gp = GaussianProcess( iopairs, gppackage; GPkernel=GPkernel, obs_noise_cov=nothing, normalized=false, noise_learn=true, truncate_svd=1.0, standardize=false, prediction_type=pred_type, norm_factor=nothing) ``` Then predict with it. ```julia μ, σ² = GaussianProcessEmulator.predict(gp, new_inputs) ``` It is short, but it inherently is stuck to the Gaussian process framework. It also hides e.g. the training away, and we may wish to have this more open. The script below is more general, separating out which parameters are related to data processing and which relate to the specific ML tool. ### New interface Setup a `GaussianProcess<:MachineLearningTool` object ```julia gp = GaussianProcess( gppackage; kernel=GPkernel, noise_learn=true, prediction_type=pred_type) ``` and then create the general emulator type using `gp` ```julia em = Emulator( gp, iopairs, obs_noise_cov=nothing, normalize_inputs=false, standardize_outputs=false, truncate_svd=1.0) ``` Train and predict ```julia Emulators.optimize_hyperparameters!(em) μ, σ² = Emulators.predict(em, new_inputs) ``` ### Adding a new `MachineLearningTool` Include a new file `NewTool.jl` at the top of `Emulator.jl` In this file define: 1. `struct NewTool <: MachineLearningTool` with constructor `NewTool(...)` to hold ML parameters and models 2. `function build_models!(NewTool,iopairs)` to build and store ML models. Called in Emulator constructor 3. `function optimize_hyperparameters!(NewTool)`to train the stored ML models. Called by method of same name in Emulator 4. `function predict(NewTool,new_inputs)` to predict with stored ML models Called by method of same name in Emulator Co-authored-by: odunbar <odunbar@caltech.edu>

bors · 2022-01-05T13:15:59Z

Build failed:

buildkite/calibrateemulatesample-ci

odunbar · 2022-01-18T00:26:18Z

bors r+

120: Modular emulator interface r=odunbar a=odunbar ## Purpose Reduces the current Emulator interface dependence of Gaussian Processes. Now the GP can be swapped for another statistical emulator. ## In the PR - [x] New general `Emulator` class. This handles all the data manipulation e.g. Normalization, Standardization, Decorrelation - [x] General interface functions for Emulator, `optimize_hyperparameters!`, `predict` - [x] New `MachineLearningTool` type, - [x] Moved the Gaussian Processes into a `GaussianProcess <: MachineLearningTool` class - [x] Example (e.g. `plot_GP`) to demonstrate the new interface - [x] Unit tests - [x] New doc strings. ## Additional change Seems to be ongoing issues with unit testing Julia 1.5.4, so I have updated the Manifest, Docs.yml and Test.yml to Julia 1.6.X ## Changes to user experience: Ingredients: ```julia gppackage = GPJL() pred_type = YType() GPkernel = ... iopairs = PairedDataContainer(x_data,y_data) ``` ### Old interface Set up a `GaussianProcessEmulator` object ```julia gp = GaussianProcess( iopairs, gppackage; GPkernel=GPkernel, obs_noise_cov=nothing, normalized=false, noise_learn=true, truncate_svd=1.0, standardize=false, prediction_type=pred_type, norm_factor=nothing) ``` Then predict with it. ```julia μ, σ² = GaussianProcessEmulator.predict(gp, new_inputs) ``` It is short, but it inherently is stuck to the Gaussian process framework. It also hides e.g. the training away, and we may wish to have this more open. The script below is more general, separating out which parameters are related to data processing and which relate to the specific ML tool. ### New interface Setup a `GaussianProcess<:MachineLearningTool` object ```julia gp = GaussianProcess( gppackage; kernel=GPkernel, noise_learn=true, prediction_type=pred_type) ``` and then create the general emulator type using `gp` ```julia em = Emulator( gp, iopairs, obs_noise_cov=nothing, normalize_inputs=false, standardize_outputs=false, truncate_svd=1.0) ``` Train and predict ```julia Emulators.optimize_hyperparameters!(em) μ, σ² = Emulators.predict(em, new_inputs) ``` ### Adding a new `MachineLearningTool` Include a new file `NewTool.jl` at the top of `Emulator.jl` In this file define: 1. `struct NewTool <: MachineLearningTool` with constructor `NewTool(...)` to hold ML parameters and models 2. `function build_models!(NewTool,iopairs)` to build and store ML models. Called in Emulator constructor 3. `function optimize_hyperparameters!(NewTool)`to train the stored ML models. Called by method of same name in Emulator 4. `function predict(NewTool,new_inputs)` to predict with stored ML models Called by method of same name in Emulator Co-authored-by: odunbar <odunbar@caltech.edu>

bors · 2022-01-18T00:29:42Z

Build failed:

buildkite/calibrateemulatesample-ci

tsj5 · 2022-01-18T04:56:05Z

Hi all -- the error message for this PR matches that discussed in issue #125, with a root cause of using a julia 1.6 Manifest with CI scripts that use julia 1.5. This PR updated 1.5 -> 1.6 in .github/workflows, but not in .buildkite/pipeline.yml. The fix in #126 should correct this, but it's currently running into its own problems (described in comment there).

odunbar · 2022-01-20T21:27:29Z

bors try

bors · 2022-01-20T21:29:55Z

try

Build failed:

buildkite/calibrateemulatesample-ci

bors · 2022-01-20T21:47:43Z

try

Build failed:

buildkite/calibrateemulatesample-ci

bors · 2022-01-20T23:00:42Z

try

Build failed:

buildkite/calibrateemulatesample-ci

tsj5 · 2022-01-21T04:52:51Z

Apologies for my confusion over which branch to add features to. As discussed with @odunbar :

PR Update examples to use Emulator interface #129 should be accepted, which will update the feature branch for this PR (orad/emulator-interface) with
a. updates to the examples, to use the new Emulator interface introduced in this PR;
b. fixes to the buildkite configuration that fix the errors encountered above.
This should resolve all outstanding issues with this PR, which can then be accepted (merged into master).

* Fix path to examples/ci * Suppress warnings from reused plot variables in plot_GP.jl * Restore learn_noise.jl from /master * Fix comparisions to nothing * Update learn_noise example to use Emulator() * Always optimize! GPs with noise_learn=false Done because noise is explicitly added to the GP kernel when it's created. This is needed to reproduce existing behavior in /master. * Fix reverse_standardize() * Update examples to use Emulators * Fix buildkite path to GaussianProcess example * Fix adding top-level repo LOAD_PATH in GP examples * Regenerate all Manifests under julia 1.6.5 * Explicit compatibility with julia 1.6.x * Temporarily use julia 1.6.2 in buildkite

odunbar · 2022-01-21T22:25:04Z

bors try

bors · 2022-01-21T23:33:00Z

try

Build succeeded:

odunbar · 2022-01-21T23:34:59Z

bors r+

bors · 2022-01-22T00:42:38Z

Build succeeded:

130: [WIP] Modular Sampler interface r=tsj5 a=tsj5 This PR re-implements the Sample step of CES to use the [AbstractMCMC](https://github.com/TuringLang/MCMCChains.jl) interface used by [Turing.jl](https://turing.ml/dev/). It may be considered a sibling of PR #120. **Motivation** The rationale for doing this is as follows (most relevant reasons first): 1. We shouldn't reinvent the wheel here, as CES doesn't claim to innovate in the area of MCMC sampling (taken on its own). a. It's reasonable to assume a user of our package who has done MCMC in Julia is familiar with Turing and its interface: Turing.jl is a major part of the Julia ecosystem, playing the role [stan](https://mc-stan.org/) does for R. b. Extensibility via this interface is a design goal of Turing, so one may assume its developers have thought more about how to best design an appropriate API (e.g. [thread](https://github.com/TuringLang/AbstractMCMC.jl/discussions/72)). The separation of concerns used by AbstractMCMC (bullet list below) is the logical way to split up the problem. 2. In practice (the examples), it streamlines code: `MCMCWrapper` objects can be reused, and constructor arguments are taken from the `Emulator` to avoid potential bugs from inconsistencies. 2. MCMCChains implements several diagnostics for MCMC convergence, in the form of [statistics](https://turinglang.github.io/MCMCChains.jl/dev/diagnostics/) and [plots](https://turinglang.github.io/MCMCChains.jl/dev/statsplots/). 3. AbstractMCMC implements thread- and process-parallel sampling of multiple chains. MCMC folklore is that one obtains the most robust estimates by running "a few" "medium-length" chains from different initial conditions, e.g. [Gilks et. al. 1996](https://books.google.com/books/about/Markov_Chain_Monte_Carlo_in_Practice.html?id=ATimDAEACAAJ). 4. Interoperability with all of Turing.jl; this may be useful in the future if the package adds samplers or other features we'd like to use off-the-shelf. **Implementation** AbstractMCMC is described in the Turing docs [here](https://turing.ml/dev/docs/for-developers/interface) and [here](https://turing.ml/dev/docs/for-developers/how_turing_implements_abstractmcmc), although this is brief, and reading the [AdvancedMH](https://github.com/TuringLang/AdvancedMH.jl) source was more enlightening. In addition to AbstractMCMC, the PR uses the [AdvancedMH](https://github.com/TuringLang/AdvancedMH.jl) extensible implementation of Metropolis-Hastings, and [MCMCChains](https://github.com/TuringLang/MCMCChains.jl) for storing sampling runs and sampling from the posterior. Turing.jl itself isn't brought in as a dependency. This PR splits up the functionality in the existing `MCMC` class into three classes: - A `AdvancedMH.DensityModel` child class which computes log-likelihood. This simply wraps the Emulator instance. - A `AdvancedMH.Proposal` child class which generates proposal moves for Metropolis-Hastings. This allows us to plug in different sampling algorithms, such as preconditioned Crank-Nicholson (PR #124; not done here). - `MCMCChains.Chain`, a struct returned as the sampler output. By not storing the sampling results in the `MCMCWrapper` object, it can be reused to configure multiple MCMC runs. The new `MCMCWrapper` class simply wraps the first two objects, performing the same standardization that was performed by the Emulator (to ensure consistency, this information is taken from the Emulator instance itself). `sample_posterior!` is replaced by new methods for `sample`, which return instances of `MCMCChains.Chain`. Co-authored-by: Thomas Jackson <tom.jackson314@gmail.com>

odunbar added 16 commits November 18, 2021 11:36

new outline of struct for emulators

d3b3bc4

new GP and emulator interface

057f394

GP included by emulator, MachineLearningTool type

fc3dd81

interface running

0cb81d5

plot_GP example with new interface

77542f7

bugfix and moved noise_learn

1536d6c

example

fb2c0d2

some initial tests

d2ddc56

improved plots

15cd814

Emulator Runtests complete

9c53984

emulator and gaussianprocess runtests passing

7ec4dbc

GP compatible with MCMC; and with tests

d8ff21d

renamed GP files

567a9e6

renamed examples

e12bf9f

docs now build

8956b1c

change test to v1 from v1.5.4 for more up-to-date testing

1e64d0c

odunbar changed the title ~~[WIP] Modular emulator interface~~ Modular emulator interface Dec 23, 2021

odunbar added 4 commits December 23, 2021 05:44

change test to v1.6

3b36626

try v1.6.2

f0d93b9

try to resolve test issues

7f038a4

resolving tests?

3c5632c

odunbar added 3 commits December 23, 2021 06:20

resolving tests?

96400e0

also update docs

0cb9ce3

test svd coverage

548948c

odunbar requested review from bielim and bischtob December 23, 2021 16:32

bischtob approved these changes Jan 3, 2022

View reviewed changes

tsj5 mentioned this pull request Jan 7, 2022

Buildkite CI broken on staging branch #125

Closed

tsj5 mentioned this pull request Jan 18, 2022

Standardize on Julia 1.6.2 #127

Merged

bors bot added a commit that referenced this pull request Jan 20, 2022

Try #120:

01af08a

pipeline.yml

a41f2b1

bors bot added a commit that referenced this pull request Jan 20, 2022

Try #120:

5adddd4

bors bot added a commit that referenced this pull request Jan 20, 2022

Try #120:

ce12c09

This was referenced Jan 20, 2022

Update examples to use Emulator interface #129

Merged

Fix adding top-level repo LOAD_PATH in GP examples #128

Closed

bors bot added a commit that referenced this pull request Jan 21, 2022

Try #120:

6f7fadd

bors bot merged commit c23c216 into master Jan 22, 2022

tsj5 mentioned this pull request Jan 25, 2022

[WIP] Modular Sampler interface #130

Merged

odunbar mentioned this pull request Jan 28, 2022

Modular emulator interface #119

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modular emulator interface #120

Modular emulator interface #120

odunbar commented Nov 24, 2021 •

edited

Loading

codecov bot commented Dec 23, 2021 •

edited

Loading

bischtob left a comment

odunbar commented Jan 5, 2022

bors bot commented Jan 5, 2022

odunbar commented Jan 18, 2022

bors bot commented Jan 18, 2022

tsj5 commented Jan 18, 2022 •

edited

Loading

odunbar commented Jan 20, 2022

bors bot commented Jan 20, 2022

bors bot commented Jan 20, 2022

bors bot commented Jan 20, 2022

tsj5 commented Jan 21, 2022

odunbar commented Jan 21, 2022

bors bot commented Jan 21, 2022

odunbar commented Jan 21, 2022

bors bot commented Jan 22, 2022

Modular emulator interface #120

Modular emulator interface #120

Conversation

odunbar commented Nov 24, 2021 • edited Loading

Purpose

In the PR

Additional change

Changes to user experience:

Old interface

New interface

Adding a new MachineLearningTool

codecov bot commented Dec 23, 2021 • edited Loading

Codecov Report

bischtob left a comment

Choose a reason for hiding this comment

odunbar commented Jan 5, 2022

bors bot commented Jan 5, 2022

odunbar commented Jan 18, 2022

bors bot commented Jan 18, 2022

tsj5 commented Jan 18, 2022 • edited Loading

odunbar commented Jan 20, 2022

bors bot commented Jan 20, 2022

try

bors bot commented Jan 20, 2022

try

bors bot commented Jan 20, 2022

try

tsj5 commented Jan 21, 2022

odunbar commented Jan 21, 2022

bors bot commented Jan 21, 2022

try

odunbar commented Jan 21, 2022

bors bot commented Jan 22, 2022

odunbar commented Nov 24, 2021 •

edited

Loading

Adding a new `MachineLearningTool`

codecov bot commented Dec 23, 2021 •

edited

Loading

tsj5 commented Jan 18, 2022 •

edited

Loading