Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize GP input #42

Merged
merged 1 commit into from
Mar 2, 2020
Merged

Normalize GP input #42

merged 1 commit into from
Mar 2, 2020

Conversation

bielim
Copy link
Contributor

@bielim bielim commented Feb 20, 2020

Normalize inputs to the GP training and prediction.

Normalization is done by centering the inputs (i.e., subtracting their mean) and multiplying them by the square root of the inverse of the input covariance.
Whether or not the GP is trained on normalized inputs can be specified with an optional input argument to GPObj ("normalized"), which defaults to true. If the GP has been trained on normalized
inputs, the "predict" function automatically applies the same same normalization when predicting on new inputs.

The idea is that normalization will make the GP hyperparameters more independent of the problem - e.g., the length scales used for the default kernels can be assumed to be reasonable defaults for many problems.

@bielim bielim self-assigned this Feb 20, 2020
@bielim bielim added the enhancement New feature or request label Feb 20, 2020
@ali-ramadhan
Copy link
Member

ali-ramadhan commented Feb 20, 2020

Normalization is done by centering the inputs (i.e., subtracting their mean) and multiplying them by the square root of the inverse of the input covariance.

Sorry if this is a stupid question but would this work no matter the units of the input or do you have to pick suitable units (potentially non-dimensional)?

Like if the inputs were near-surface temperature profiles in °C would it be bad that tropical data is always very positive while Arctic data can have negative numbers? I guess maybe you want your input data to have units of Kelvin or use potential temperature instead.

@odunbar
Copy link
Collaborator

odunbar commented Feb 20, 2020

@ali-ramadhan. The transformation is applied internally just as a means to aid the interpretation of parameters in the Gaussian Process. You don't have to do anything to your inputs (or units) before giving them as inputs to the GPObj, or the Predict function.

So if you give it a training pair of (3C ,100mm), then when you want to predict at 3C it will still give you answer 100mm. Internally however it transforms the 3C -> X and then trains on (X,100mm)

@ali-ramadhan
Copy link
Member

Thank you for the clarification @odunbar!

Copy link
Collaborator

@odunbar odunbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Melanie, this all seems reasonable! it may just be me but i've not seen the first.(X) and last.(X) functions (e.g line 200 in GPEmulator.jl) . But if there it's cleaner to keep the mu and sigma variables together then I'm fine with this.

This hopefully conditions the inputs into a nicer space. In a later patch it is probably a good idea for us to also include conditioning on the outputs (e.g SVD) to decorrelate the GPs - we can discuss this later! Thanks for the work!

@charleskawczynski
Copy link
Member

bors r+

@codecov
Copy link

codecov bot commented Mar 2, 2020

Codecov Report

Merging #42 into master will increase coverage by 0.82%.
The diff coverage is 48.14%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #42      +/-   ##
==========================================
+ Coverage   59.58%   60.41%   +0.82%     
==========================================
  Files          12       12              
  Lines         480      485       +5     
==========================================
+ Hits          286      293       +7     
+ Misses        194      192       -2
Impacted Files Coverage Δ
src/MCMC.jl 95.74% <100%> (+16.74%) ⬆️
src/GPEmulator.jl 41.66% <46.15%> (+0.85%) ⬆️
src/Utilities.jl 33.33% <0%> (-33.34%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f85852c...0d22d17. Read the comment docs.

Copy link
Member

@charleskawczynski charleskawczynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bors
Copy link
Contributor

bors bot commented Mar 2, 2020

@bors bors bot merged commit 35d3feb into master Mar 2, 2020
@bors bors bot deleted the mb/normalize_gp_input branch March 2, 2020 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants