-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalize GP input #42
Conversation
Sorry if this is a stupid question but would this work no matter the units of the input or do you have to pick suitable units (potentially non-dimensional)? Like if the inputs were near-surface temperature profiles in °C would it be bad that tropical data is always very positive while Arctic data can have negative numbers? I guess maybe you want your input data to have units of Kelvin or use potential temperature instead. |
@ali-ramadhan. The transformation is applied internally just as a means to aid the interpretation of parameters in the Gaussian Process. You don't have to do anything to your inputs (or units) before giving them as inputs to the GPObj, or the Predict function. So if you give it a training pair of (3C ,100mm), then when you want to predict at 3C it will still give you answer 100mm. Internally however it transforms the 3C -> X and then trains on (X,100mm) |
Thank you for the clarification @odunbar! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Melanie, this all seems reasonable! it may just be me but i've not seen the first.(X) and last.(X) functions (e.g line 200 in GPEmulator.jl) . But if there it's cleaner to keep the mu and sigma variables together then I'm fine with this.
This hopefully conditions the inputs into a nicer space. In a later patch it is probably a good idea for us to also include conditioning on the outputs (e.g SVD) to decorrelate the GPs - we can discuss this later! Thanks for the work!
15c54cb
to
0d22d17
Compare
bors r+ |
Codecov Report
@@ Coverage Diff @@
## master #42 +/- ##
==========================================
+ Coverage 59.58% 60.41% +0.82%
==========================================
Files 12 12
Lines 480 485 +5
==========================================
+ Hits 286 293 +7
+ Misses 194 192 -2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Normalize inputs to the GP training and prediction.
Normalization is done by centering the inputs (i.e., subtracting their mean) and multiplying them by the square root of the inverse of the input covariance.
Whether or not the GP is trained on normalized inputs can be specified with an optional input argument to GPObj ("normalized"), which defaults to true. If the GP has been trained on normalized
inputs, the "predict" function automatically applies the same same normalization when predicting on new inputs.
The idea is that normalization will make the GP hyperparameters more independent of the problem - e.g., the length scales used for the default kernels can be assumed to be reasonable defaults for many problems.