Add ScaleKernel to get_covar_module_with_dim_scaled_prior #2619

dai08srhg · 2024-11-10T05:09:17Z

Motivation

Since version 0.12.0, dim_scaled_lognormal_prior[Hvarfner2024vanilla] has become the default. However, as ScaleKernel is not applied in get_covar_module_with_dim_scaled_prior(), performance may deteriorate in some cases.
(The selection of the prior distribution depends on the task, but adjusting the scale seems beneficial and unproblematic across tasks.)

An example is shown below.
Run the task of finding the minimum of Styblinski-Tang (D=40) three times and compare the average performance.

Have you read the Contributing Guidelines on pull requests?

I have read it.

Test Plan

Since this is a performance-related change, testing will involve a performance comparison on benchmark functions (such as Styblinski-Tang).

Related PRs

(If this PR adds or changes functionality, please take some time to update the docs at https://github.com/pytorch/botorch, and link to your PR here.)

Balandat · 2024-11-10T18:28:58Z

@hvarfner you have done some evaluations here - any thoughts? My understanding was that we didn't really see any benefit from using a scale kernel across a variety of functions if we standardize the outcomes.

hvarfner · 2024-11-15T18:02:43Z

Hi @dai08srhg ,

Thanks for checking this out, and sorry for the late reply! When the new priors were implemented, the ScaleKernel was dropped after a lot of ablation. In fact, there are some cases where the outputscale is actually problematic.

The results in the paper (e.g. Fig. 19-22) shows that the performance was frequently substantially worse with a ScaleKernel. For high-dimensional problems, the outputscale parameter tends to shrink quite rapidly, leading to very local behavior - where worse perormance may follow. On some internal testing on mid- and high-dimensional problems, the performance with the inclusion of a ScaleKernel was generally not better, either.

Now, the shrinkage does not always happen, and is not always bad for performance. I re-ran your specific experiment, and also noticed that the ScaleKernel inclusion was slightly better --> stybtang40.pdf, but the difference I saw was not as stark (10 runs).

With that said, I think the effect that the outputscale (whether learned or not, when it shrinks and whether that is good or bad) is very interesting and would like to understand it better. However, we concluded that the inclusion of a ScaleKernel does more harm than good - both in terms of regret performance and the exploration-exploitation trade-off.

Add ScaleKernel to get_covar_module_with_dim_scaled_prior

8d6960f

facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Nov 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ScaleKernel to get_covar_module_with_dim_scaled_prior #2619

Add ScaleKernel to get_covar_module_with_dim_scaled_prior #2619

Uh oh!

dai08srhg commented Nov 10, 2024 •

edited

Loading

Uh oh!

Balandat commented Nov 10, 2024

Uh oh!

hvarfner commented Nov 15, 2024

Uh oh!

Uh oh!

Add ScaleKernel to get_covar_module_with_dim_scaled_prior #2619

Are you sure you want to change the base?

Add ScaleKernel to get_covar_module_with_dim_scaled_prior #2619

Uh oh!

Conversation

dai08srhg commented Nov 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Have you read the Contributing Guidelines on pull requests?

Test Plan

Related PRs

Uh oh!

Balandat commented Nov 10, 2024

Uh oh!

hvarfner commented Nov 15, 2024

Uh oh!

Uh oh!

dai08srhg commented Nov 10, 2024 •

edited

Loading