Performance of Multitask GP #2519

GHU2021 · 2024-04-29T18:53:39Z

GHU2021
Apr 29, 2024

Hi,

I want to compare the performance of Multitask GP (MTGP) with Independent Multioutput GP (MOGP) based on their GPyTorch implementations/examples.

When I calculated the mean absolute percentage error (MAPE) for MOGP and MTGP, I saw that mostly the MAPE of MOGP was less than the MAPE of MTGP (while I expected the opposite) or their MAPEs were very close to each other. I changed data dimensions and the number of training iterations, but I saw a similar issue.

As it is mentioned on the GPyTorch website (https://docs.gpytorch.ai/en/stable/examples/03_Multitask_Exact_GPs/Multitask_GP_Regression.html), when we are performing regression on some functions that share the same inputs and they are sinusoidal and have similarities, MTGP should be useful and its performance should be better than MOGP, otherwise, it is not useful to use MTGP because it has a more computational complexity that MOGP and it takes more execution time.

I implemented MTGP and MOGP for an application with more complexity and saw this issue too.

@Balandat, I would appreciate it if you could explain what the issue is and help resolve it.

Thank you for your time and consideration.

Here is the code based on GPyTorch examples:

import math
import torch
import gpytorch
import time

train_x = torch.linspace(0, 1, 400)

train_y = torch.stack([
    torch.sin(train_x * (2 * math.pi)) + torch.randn(train_x.size()) * 0.2,
    torch.cos(train_x * (2 * math.pi)) + torch.randn(train_x.size()) * 0.2,
], -1)

start = time.time()
class MultitaskGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood_MTGP):
        super(MultitaskGPModel, self).__init__(train_x, train_y, likelihood_MTGP)
        self.mean_module = gpytorch.means.MultitaskMean(
            gpytorch.means.ConstantMean(), num_tasks=2)
        self.covar_module = gpytorch.kernels.MultitaskKernel(
            gpytorch.kernels.RBFKernel(), num_tasks=2, rank=1)

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultitaskMultivariateNormal(mean_x, covar_x)

likelihood_MTGP = gpytorch.likelihoods.MultitaskGaussianLikelihood(num_tasks=2)
model_MTGP = MultitaskGPModel(train_x, train_y, likelihood_MTGP)
model_state_MTGP= model_MTGP.state_dict()
training_iterations = 100

model_MTGP.train()
likelihood_MTGP.train()

optimizer_MTGP = torch.optim.Adam(model_MTGP.parameters(), lr=0.1)  # Includes GaussianLikelihood parameters

mll_MTGP = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood_MTGP, model_MTGP)

for i in range(training_iterations):
    optimizer_MTGP.zero_grad()
    output_MTGP = model_MTGP(train_x)
    loss_MTGP = -mll_MTGP(output_MTGP, train_y)
    loss_MTGP.backward()
    if (i==(training_iterations-1)):
        print('Iter %d/%d, Loss_MTGP: %.3f' % (i + 1, training_iterations, loss_MTGP.item()))
    optimizer_MTGP.step()

model_MTGP.eval()
likelihood_MTGP.eval()

with torch.no_grad(), gpytorch.settings.fast_pred_var():
    test_x = torch.linspace(0, 1, 50)
    MTGP_dist = likelihood_MTGP(model_MTGP(test_x))
    mean_MTGP = MTGP_dist.mean

test_y = torch.stack([
    torch.sin(test_x * (2 * math.pi)) + torch.randn(test_x.size()) * 0.2,
    torch.cos(test_x * (2 * math.pi)) + torch.randn(test_x.size()) * 0.2,
], -1)

MAPE_MTGP1=torch.mean(torch.abs(torch.div(test_y[:,0]-mean_MTGP[:,0],test_y[:,0])))*100
MAPE_MTGP2=torch.mean(torch.abs(torch.div(test_y[:,1]-mean_MTGP[:,1],test_y[:,1])))*100
end = time.time()
print('MAPE_MTGP1= %.3f%%, MAPE_MTGP2= %.3f%%' % (MAPE_MTGP1,MAPE_MTGP2))
print('Execution time of MTGP= %d Sec.\n' % ((end - start)))

train_y_MOGP = train_y[:,0]
start = time.time()
class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y_MOGP, likelihood_MOGP):
        super(ExactGPModel, self).__init__(train_x, train_y_MOGP, likelihood_MOGP)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.RBFKernel()
    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

likelihood_MOGP = gpytorch.likelihoods.GaussianLikelihood()
model_MOGP = ExactGPModel(train_x, train_y_MOGP, likelihood_MOGP)
model_state_MOGP= model_MOGP.state_dict()

model_MOGP.train()
likelihood_MOGP.train()

optimizer_MOGP = torch.optim.Adam(model_MOGP.parameters(), lr=0.1)  # Includes GaussianLikelihood parameters

mll_MOGP = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood_MOGP, model_MOGP)

for i in range(training_iterations):
    optimizer_MOGP.zero_grad()
    output_MOGP = model_MOGP(train_x)
    loss_MOGP = -mll_MOGP(output_MOGP, train_y_MOGP)
    loss_MOGP.backward()
    if (i==(training_iterations-1)):
        print('Iter %d/%d, loss_MOGP: %.3f' % (i + 1, training_iterations, loss_MOGP.item()))
    optimizer_MOGP.step()

model_MOGP.eval()
likelihood_MOGP.eval()

with torch.no_grad(), gpytorch.settings.fast_pred_var():
    observed_pred = likelihood_MOGP(model_MOGP(test_x))
    mean_MOGP = observed_pred.mean

MAPE_MOGP1=torch.mean(torch.abs(torch.div(test_y[:, 0]-mean_MOGP,test_y[:, 0])))*100
    
train_y_MOGP = train_y[:,1]
class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y_MOGP, likelihood_MOGP):
        super(ExactGPModel, self).__init__(train_x, train_y_MOGP, likelihood_MOGP)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.RBFKernel()
    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

likelihood_MOGP = gpytorch.likelihoods.GaussianLikelihood()
model_MOGP = ExactGPModel(train_x, train_y_MOGP, likelihood_MOGP)
model_state_MOGP= model_MOGP.state_dict()

model_MOGP.train()
likelihood_MOGP.train()

optimizer_MOGP = torch.optim.Adam(model_MOGP.parameters(), lr=0.1)  # Includes GaussianLikelihood parameters

mll_MOGP = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood_MOGP, model_MOGP)

for i in range(training_iterations):
    optimizer_MOGP.zero_grad()
    output_MOGP = model_MOGP(train_x)
    loss_MOGP = -mll_MOGP(output_MOGP, train_y_MOGP)
    loss_MOGP.backward()
    if (i==(training_iterations-1)):
        print('Iter %d/%d, loss_MOGP: %.3f' % (i + 1, training_iterations, loss_MOGP.item()))
    optimizer_MOGP.step()

model_MOGP.eval()
likelihood_MOGP.eval()

with torch.no_grad(), gpytorch.settings.fast_pred_var():
    observed_pred = likelihood_MOGP(model_MOGP(test_x))
    mean_MOGP = observed_pred.mean

MAPE_MOGP2=torch.mean(torch.abs(torch.div(test_y[:, 1]-mean_MOGP,test_y[:, 1])))*100
end = time.time()
print('MAPE_MOGP1= %.3f%%, MAPE_MOGP2= %.3f%%' % (MAPE_MOGP1,MAPE_MOGP2))
print('Execution time of MOGP= %d Sec.' % ((end - start)))

Balandat · 2024-05-11T00:34:02Z

Balandat
May 11, 2024
Maintainer

A few things:

You're evaluating both tasks at the exact same points (train_x). The multi-task setup can shine more if you have different locations at which you observe the different tasks. In fact, you can show that if the observations you have are noiseless, then you get zero benefit from using a MTGP if you are evaluating the tasks at the same points. This is called autokrigeability.
Your test points test_y are noisy, whereas your model estimates the latent (the trigononmetric functions without the noise). I wouldn't be surprised if this throws off and potentially dominates your error metrics. You should use non-noisy test points since you have the ground truth here.

        self.covar_module = gpytorch.kernels.MultitaskKernel(
            gpytorch.kernels.RBFKernel(), num_tasks=2, rank=1)

I recommend using rank=2 here to estimate a full matrix (only 3 parameters here, so unlikely to overfit, this is more a concern with many more tasks).

You're optimizing the model parameters with Adam. How much have you played around with the learning rate and other settings? This can sometimes be tricky to get right, so you may not be optimizing the model parameters fully. I BoTorch we therefore by default use L-BFGS-B as an optimizer, which is involved via fit_gpytorch_mll. I recommend you try this here as well.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of Multitask GP #2519

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Performance of Multitask GP #2519

GHU2021 Apr 29, 2024

Replies: 1 comment

Balandat May 11, 2024 Maintainer

GHU2021
Apr 29, 2024

Balandat
May 11, 2024
Maintainer