Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Derivative Enabled GPs #1679

Open
yyexela opened this issue Feb 15, 2023 · 15 comments
Open

[Feature Request] Derivative Enabled GPs #1679

yyexela opened this issue Feb 15, 2023 · 15 comments
Labels
enhancement New feature or request

Comments

@yyexela
Copy link
Contributor

yyexela commented Feb 15, 2023

🚀 Feature Request

Hello, I'm interested in studying GPs with derivatives, similar to the example provided in this notebook provided in this issue. However, the previous notebook uses LogEI on noisy data, which isn't optimal. As such, I want to be able to use both qNEI and KG on noisy data using this derivative-enabled GP in BoTorch.

Motivation

Derivative enabled GPs allow for faster convergence rates. This is well studied in the literature and makes intuitive sense since a lot more information is provided with each objective function evaluation. I want code to be available for myself and other future researchers who want to explore derivative enabled GPs.

Pitch

Ideally, I would want to see a BoTorch model for GPs with derivative information that work without any issues when used with qNEI and KG. I want this implemented for my research and I have a significant amount of time I can contribute to implementing this myself. However, I'm not sure where to even begin since I'm not familiar with the code-base for BoTorch. I don't want to spend lots of time looking in the wrong place for implementing this, so I would appreciate any and all help! My current issue is trying to get qNEI to work. I've added self._num_outputs = 1+d to my derivative enabled GP class, but I'm getting the following error:

torch._C._LinAlgError: linalg.cholesky: (Batch element 0): The factorization could not be completed because the input is not positive-definite (the leading minor of order 6 is not positive-definite).

which I'm trying to resolve. As I've mentioned earlier, however, I'm not sure this is the right place to look. Thanks!

Are you willing to open a pull request? (See CONTRIBUTING)
Yes

Additional context

I also briefly looked into sub-classing BatchedMultiOutputGPyTorchModel for my model, but the documentation says the outputs are independent, which I don't think applies to derivative enabled GPs. However, when I subclassed this I got another error:

RuntimeError: Cannot yet add fantasy observations to multitask GPs, but this is coming soon!

which led me to this issue from 4 years ago mentioning a similar problem to what I'm having.

Thanks again!

@yyexela yyexela added the enhancement New feature or request label Feb 15, 2023
@yyexela yyexela changed the title [Feature Request] [Feature Request] Derivative Enabled GPs Feb 15, 2023
@Balandat
Copy link
Contributor

Hi @yyexela, great to see you interested in using derivative-enabled GPs with other acquisition functions.

The error you're getting suggests that there may be numerical problems unrelated to the fact that this is using derivative-enabled GPs. On the other hand, we haven't really battle tested derivative-enabled GPs, so there may also be something else going on.

Could you provide a simple repro of the issue you're running into?

@Balandat
Copy link
Contributor

Also cc @pabloprf re derivative-enabled GPs.

@yyexela
Copy link
Contributor Author

yyexela commented Feb 15, 2023

Sure! I have two examples ready, one with derivative information (Simple_Repro_D.py.txt) and one without (Simple_Repro_non_D.py.txt). The first fails at torch._C._LinAlgError: linalg.cholesky and the second doesn't. Let me know if there is anything else I can help with! EDIT: and what my next steps should be ;)

@yyexela
Copy link
Contributor Author

yyexela commented Feb 24, 2023

I think I'm getting close...

I have three files which I'm using for testing. One file just has shared functions, the other two define the models. One model is without derivatives, the other is with derivatives, but both use GPyTorch without calling a BoTorch model directly (though they both sub-class GPyTorchModel):

With this, (and lots of print statements in the source code) I can see that the non-derivative version (I call it non-D) eventually calls sample_cached_cholesky in botorch/utils. I can examine the covariance matrix and I see that two of the rows are identical. This is because I'm trying to compute the posterior at X_base (the points already sampled) concatenated with X_test (just zero). Since X_base contains zero, two rows of the covariance matrix are the same (both the first and last row). This is why the following message appears:

/home/alexey/.conda/envs/thesis/lib/python3.10/site-packages/linear_operator/utils/cholesky.py:40: NumericalWarning: A not p.d., added jitter of 1.0e-08 to the diagonal

Tracking the path of the code for the derivative-enabled case (I call it yes-D), I see that the same function is not called. This is because the function gets called in class MultivariateNormal's __init__ in gpytorch/distributions when either the mean or covariance used to construct the class is a LinearOperator. For the non-D case, the covariance matrix is:

<linear_operator.operators.constant_mul_linear_operator.ConstantMulLinearOperator object at 0x7f33fe4f36a0>

while for the yes-D case, this is not true. This results in a call to torch.linalg.cholesky(covariance_matrix) which throws the error:

torch._C._LinAlgError: linalg.cholesky: (Batch element 0): The factorization could not be completed because the input is not positive-definite (the leading minor of order 6 is not positive-definite).

which makes sense, since in my test example from the code above, there are six rows, the last one being identical to the first one!

So, I have the following questions:

  • Should I try to figure out how to make the yes-D case have a LinearOperator covariance matrix? It seems that for both the yes-D and non-D case the covariance matrices start as <linear_operator.operators.dense_linear_operator.DenseLinearOperator object at 0x7fc3403fe5f0>. For the non-D case it turns into the ConstantMulLinearOperator above while for the yes-D case it loses its LinearOperator class.
    or
  • Should I try to add some code to add functionality similar to the jitter in .../site-packages/linear_operator/utils/cholesky.py?

Thanks again and let me know if something wasn't clear!

@yyexela
Copy link
Contributor Author

yyexela commented Mar 1, 2023

Sweet!

So I have a PR opened which resolves the issues I was having with the Cholesky decomposition. The simple fix I added makes it so that the Cholesky decomposition is calculated from the linear_operator package instead of pytorch. This allows for adding a jitter to the diagonal of a non positive definite matrix.

The matrix was not positive-definite, because as I mentioned above, two rows of the covariance matrix were identical if the acquisition function was being evaluated at a point that was evaluated on the objective function earlier.

I'll go ahead and start working on getting KG working too, let me know what else I can do! For example, if there are any issues in the PR:)

@Balandat
Copy link
Contributor

Balandat commented Mar 5, 2023

Nice work tracking down this issue! It seems a bit weird to me to fix this by wrapping the covariance in a DenseLinearOperator though, especially since it represents the same object - and ideally the handling of numerical issues / singularity should be happening at a higher level than at the level of the covariance object.

I think we can consider merging this if it doesn't cause any issues also on our end, since the overhead of this should be very small. But we should also take a closer look at this - ideally we'd be able to just use torch Tensors and linear_operator LinearOperators interchangeably in all places without affecting the outcomes / robustness of the code, especially when the underlying representation is the same.

@SebastianAment - curious if you have any thoughts on this.

facebook-github-bot pushed a commit that referenced this issue Mar 11, 2023
Summary:
<!--
Thank you for sending the PR! We appreciate you spending the time to make BoTorch better.

Help us understand your motivation by explaining why you decided to make this change.

You can learn more about contributing to BoTorch here: https://github.com/pytorch/botorch/blob/main/CONTRIBUTING.md
-->

## Motivation

Derivative enabled GPs allow for faster convergence rates. This is well studied in the literature and makes intuitive sense since a lot more information is provided with each objective function evaluation. I want code to be available for myself and other future researchers who want to explore derivative enabled GPs.

This PR resolves an error with the qNEI acquisition function for derivative enabled GPs. For more information, see [here](#1679).

### Have you read the [Contributing Guidelines on pull requests](https://github.com/pytorch/botorch/blob/main/CONTRIBUTING.md#pull-requests)?

Yes, this doesn't affect the outcome of the unit tests, though when I run the unit tests on an unmodified branch of *main* I get `110 failed, 853 passed, 1165 warnings`.

Pull Request resolved: #1716

Test Plan:
I have some files from an issue ([1679](#1679)) I opened which show a simple case where a derivative-enabled Gaussian Process fails to find the qNEI value at a point where an equivalent non-derivative-enabled Gaussian Process does not fail.

- [shared.py.txt](https://github.com/pytorch/botorch/files/10820140/shared.py.txt)
- [Simple_Repro_non_D_2.py.txt](https://github.com/pytorch/botorch/files/10820141/Simple_Repro_non_D_2.py.txt)
- [Simple_Repro_non_D.py.txt](https://github.com/pytorch/botorch/files/10820142/Simple_Repro_non_D.py.txt)

Adding the fix in this PR removes the error.

Please let me know if I should add a specific unit test for this.

## Related PRs

N/A

Reviewed By: SebastianAment

Differential Revision: D43822902

Pulled By: esantorella

fbshipit-source-id: 0876bebb9364fca6c10379d92e792cb2dc9865e1
@Balandat
Copy link
Contributor

@yyexela with the fix landed, is there anything outstanding on this feature request or can this be closed out?

@yyexela
Copy link
Contributor Author

yyexela commented Mar 19, 2023

Great! Yes, I want to also get KG working with derivative GPs and after that I can make a tutorial notebook and then I think this could be closed out. The fix just gets qNEI working.

@yyexela
Copy link
Contributor Author

yyexela commented May 19, 2023

I'll get to work on a tutorial notebook now that #2317 merged:)

@Balandat
Copy link
Contributor

Amazing!

@Balandat
Copy link
Contributor

Balandat commented Dec 2, 2023

@yyexela checking in if you had a chance to work on this tutorial?

@yyexela
Copy link
Contributor Author

yyexela commented Dec 2, 2023

Hey! Sorry, yes, I had something working and life got in the way, I'll try to get something pushed in the next week or so. Thanks for checking in!

@yyexela
Copy link
Contributor Author

yyexela commented Dec 3, 2023

@Balandat Found my tutorial and it still works so I was able to get it in sooner than planned:)

@yyexela
Copy link
Contributor Author

yyexela commented Dec 3, 2023

Where can I see the versions of the packages installed for the smoke test? I was able to run my notebook just fine locally with a fresh conda environment and latest package versions, but got an error in my PR...

@Balandat
Copy link
Contributor

Balandat commented Dec 3, 2023

Let's move this discussion onto the PR, will respond there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants