Implementation of latent component Gaussian process (LCGP). LCGP handles the emulation of multivariate stochastic simulation outputs.
The development of this work is described fully in the following work, cited as:
@phdthesis{chan2023thesis, author = "Moses Y.-H. Chan", title = "High-Dimensional Gaussian Process Methods for Uncertainty Quantification", school = "Northwestern University", year = "2023", }
List of Contents:
The implementation of LCGP can be installed through
pip install lcgp
Note on LBFGS optimizer:
It is strongly recommended that PyTorch-LBFGS is installed to fully utilize this implementation. Installation guide on PyTorch-LBFGS can be found on its repository.
A list of basic tests is provided for user to verify that LCGP is installed correctly. Execute the follow code within the root directory:
pip install pytest pytest-cov
pytest
import numpy as np
from lcgp import LCGP
from lcgp import evaluation # optional evaluation module
# Generate fifty 2-dimensional input and 4-dimensional output
x = np.random.randn(50, 2)
y = np.random.randn(4, 50)
# Define LCGP model
model = LCGP(y=y, x=x)
# Estimate error covariance and hyperparameters
model.fit()
# Prediction
p = model.predict(x0=x) # mean and variance
rmse = evaluation.rmse(y, p[0].numpy())
dss = evaluation.dss(y, p[0].numpy(), p[1].numpy(), use_diag=True)
print('Root mean squared error: {:.3E}'.format(rmse))
print('Dawid-Sebastiani score: {:.3f}'.format(dss))
# Access parameters
print(model)
There are two ways to specify the number of latent components by passing one of the following arguments in initializing an LCGP instance:
q = 5
: Five latent components will be used.q
must be less than or equal to the output dimension.var_threshold = 0.99
: Include q latent components such that 99% of the output variance are explained, using a singular value decomposition.
Note: Only one of the options should be provided at a time.
model_q = LCGP(y=y, x=x, q=5)
model_var = LCGP(y=y, x=x, var_threshold=0.99)
If errors of multiple output dimensions are expected to be similar, the error variances can be grouped in estimation.
For example, the 6-dimensional output is split into two groups: the first two have low errors and the remaining four have high errors.
import numpy as np
x = np.linspace(0, 1, 100)
y = np.row_stack((
np.sin(x), np.cos(x), np.tan(x),
np.sin(x/2), np.cos(x/2), np.tan(x/2)
))
y[:2] += np.random.normal(2, 1e-3, size=(2, 100))
y[2:] += np.random.normal(-2, 1e-1, size=(4, 100))
Then, LCGP can be defined with the argument diag_error_structure
as
a list of output dimensions to group. The following code groups the
first 2 and the remaining 4 output dimensions.
model_diag = LCGP(y=y, x=x, diag_error_structure=[2, 4])
By default, LCGP assigns a separate error variance to each dimension, equivalent to
model_diag = LCGP(y=y, x=x, diag_error_structure=[1]*6)
Three submethods are implemented under LCGP:
- Full posterior (
full
) - ELBO (
elbo
) - Profile likelihood (
proflik
)
Under circumstances where the simulation outputs are stochastic, the full posterior approach should perform the best. If the simulation outputs are deterministic, the profile likelihood method should suffice.
LCGP_models = []
submethods = ['full', 'elbo', 'proflik']
for submethod in submethods:
model = LCGP(y=y, x=x, submethod=submethod)
LCGP_models.append(model)
LCGP standardizes the simulation output by each dimension to facilitate
hyperparameter training. The two choices are implemented through
robust_mean = True
or robust_mean = False
.
robust_mean = False
: The empirical mean and standard deviation are used.robust_mean = True
: The empirical median and median absolute error are used.
model = LCGP(y=y, x=x, robust_mean=False)