Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arc Kernel for Conditional Spaces #1023

Open
BCJuan opened this issue Jan 16, 2020 · 7 comments
Open

Arc Kernel for Conditional Spaces #1023

BCJuan opened this issue Jan 16, 2020 · 7 comments

Comments

@BCJuan
Copy link
Contributor

BCJuan commented Jan 16, 2020

🚀 Feature Request

Introduction

I am working on Bayesian optimization with gaussian processes for hyperparameter tuning of ML models. I am indirect user of your library (through Ax and Botorch. Many ML models have conditional configuration spaces. I understand that gaussian processes do not support directly conditional spaces. However, this article Raiders of the Lost Architecture:Kernels for Bayesian Optimization in Conditional Parameter Spaces defines a kernel which would be useful in such spaces.

Motivation

Would be useful to implement this kernel for use in upper libraries. The usage would allow the optimization of conditional configuration spaces by simply implementing this kernel (if I have understood correctly the article).

In the case where conditionality is related to hierarchy, this would solve issues such as this one. Also would be a nice try for a flaw consistently assigned to GPs.

Pitch

In a rough and simplistic way, the kernel consists in a cylindrical embedding in a standard RBF kernel. And then a product of the kernels for each dimension to build the final kernel. I leave the images of the descriptions found in the Github of the project.

arc

arc1

arc2

I have made myself an implementation:

class ArcKernel(Kernel):

    def __init__(self, num_parameters, **kwargs):
        super(ArcKernel, self).__init__(has_lengthscale=True, **kwargs)

        self.register_parameter(
            name="raw_angle",
            parameter=torch.nn.Parameter(torch.randn(1, 1, self.ard_num_dims))
            )
        angle_constraint = Interval(0, 1)
        self.register_constraint("raw_angle", angle_constraint)

        self.register_parameter(
            name="raw_radius",
            parameter=torch.nn.Parameter(torch.randn(1, 1, self.ard_num_dims))
            )
        radius_constraint = Positive()
        self.register_constraint("raw_radius", radius_constraint)

        self.num_parameters = num_parameters

    def embedding(self, x):
        x_ = x.div(self.lengthscale)
        x_s = self.raw_radius*torch.sin(pi*self.raw_angle*x_)
        x_c = self.raw_radius*torch.sin(pi*self.raw_angle*x_)
        x_ = torch.cat((x_s, x_c), dim=-1).squeeze(0)
        return x_

    def forward(self, x1, x2, diag=False, **params):

        x1_, x2_ = self.embedding(x1), self.embedding(x2)
        return self.covar_dist(x1_, x2_, square_dist=True, diag=diag,
                               dist_postprocess_func=postprocess_rbf,
                               postprocess=True, **params)

And for calling it, it would be a composition of StructuredProductKernel, ScaleKernel, RBFKernel and the newly build ArcKernel.

I can make my self a pull request if the request is accepted. Also, I have not the enough knowledge to ensure that this kernel should work exactly as the explained in the paper. I have made my implementation but have not tested it in any experiment. if desired, I could do also the test bu would need orientation.

Thank you very much in advance.

P.S: you can find info summarized in the repository of the original authors, under the folder latex. I do attach also images from that source.

@BCJuan BCJuan changed the title Arc Kernel for Hierarchical Spaces Arc Kernel for Conditional Spaces Jan 16, 2020
@gpleiss
Copy link
Member

gpleiss commented Jan 16, 2020

Sure - this seems like it could be useful to have! We're definitely open to a PR :)

@jacobrgardner
Copy link
Member

@BCJuan If you put up a pull request, my recommendation for the implementation as you've written it would be to have ArcKernel take a base_kernel in __init__ and then use that in forward rather than directly assuming an RBF kernel, e.g.

def forward(self, x1, x2, diag=False, **params):
    x1_, x2_ = self.embedding(x1), self.embedding(x2)
    return self.base_kernel(x1_, x2_)

See for example how ScaleKernel is implemented. This way we could support e.g. Matern base kernels (or, as the paper points out, your "favorite Euclidean covariance"):

base_kernel = MaternKernel(nu=2.5)
base_kernel.raw_lengthscale.requires_grad_(False)  # Don't learn base lengthscale since ArcKernel has one
covar_module = ArcKernel(base_kernel, num_parameters)

@BCJuan
Copy link
Contributor Author

BCJuan commented Jan 16, 2020

Thank you.

I am writing it properly and testing it so the PR makes sense: adding setters and possibilities of priors, etc

I will post asa I have the kernel decent for commit and it has passed some tests. I welcome greatly any other recommendation.

@Balandat
Copy link
Collaborator

This is great, @BCJuan. Please let us know over at botorch if you need any help with hooking this up with the acquisition functions.

@BCJuan
Copy link
Contributor Author

BCJuan commented Jan 17, 2020

Hi,

I have finally cleaned my implementation. Now I am able to make the PR.

I have reproduced the example of (Exact GPs)[https://gpytorch.readthedocs.io/en/latest/examples/01_Exact_GPs/Simple_GP_Regression.html] with the kernel and seems to work fine. I can upload the notebooks, or something similar; as you wish.

I do only worry about the kernel size definition. Now is simply a vector of the number of dimensions, but maybe would have to be something like

        self.register_parameter(
            name="raw_angle",
            parameter=torch.nn.Parameter(torch.zeros(*self.batch_shape, 1, self.ard_num_dims))
            )

@Balandat Thank you. Indeed, I have tried to implement it in the tutorial Botorch with Ax but there are numerical problems. The following error appears:

/home/kostal/anaconda3/envs/deep/lib/python3.7/site-packages/gpytorch/utils/cholesky.py:42: RuntimeWarning: A not p.d., added jitter of 1e-08 to the diagonal
  warnings.warn(f"A not p.d., added jitter of {jitter_new} to the diagonal", RuntimeWarning)

And finally the error:

untimeError: Lapack Error syev : 2 off-diagonal elements didn't converge to zero at /opt/conda/conda-bld/pytorch_1556653215914/work/aten/src/TH/generic/THTensorLapack.cpp:296

Also, the angle constrain which should be Interval(0, 1)is now `Positive() due to an error:

RuntimeError: value cannot be converted to type int64_t without overflow: inf

@Balandat
Copy link
Collaborator

Happy to help w/ debugging one the botorch end. Once your PR here is up, can you share the full code so I can reproduce?

@BCJuan
Copy link
Contributor Author

BCJuan commented Jan 25, 2020

Of course @Balandat. I have change the course of development a little though. I will make the tests for the kernel in gpytorch first and once it is clear that it works properly I will pass to Botorch. Maybe it was bold to go directly to botorch without checking first in Gpytorch. Nevertheless, it would be great to test it in Botorch and see the difference in conditional spaces with respect, for example, Matern kernel. If there is anything I can do, please do not doubt to ask. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants