Custom Model, Dropout NN Posterior #2127

kstone40 · 2023-11-22T14:53:47Z

kstone40
Nov 22, 2023

Hello!

I am struggling to figure out the best way to create a custom (non-deterministic) model in BoTorch without using a GP from GPyTorch.

According to the docs, the base model API requires a Posterior method, which returns a Posterior object.

Does this have to be a GPyTorch posterior? As a simple example, I'd like to build a FF network with dropout, from which I'd sample N times to generate a distribution over X. What would be the best way to wrap that in a posterior so that it is compatible with a BoTorch custom model?

Thanks!
Kevin

Answered by Balandat

Aug 4, 2024

When I init EnsemblePosterior, should I just simulate a very large number of samples?

Yeah that's the idea. More is better for accuracy, you ofc need to trade this off with compute / memory.

It does not appear there is error handling for if rsample's sample_shape is longer than values. If that is just limited by optim_samples during optimization, I can handle the errors that way.

The way rsample is implemented is that it can generate repeated draws from the finite samples, see the logic here.

View full answer

sdaulton · 2023-11-22T15:54:15Z

sdaulton
Nov 22, 2023
Collaborator

Hi @kstone40, a GPytorch posterior is not required. The steps would be to 1) implement a DropoutPosterior (or something) that is a subclass of Posterior (

botorch/botorch/posteriors/posterior.py

Line 22 in 0d66aa0

class Posterior(ABC):

), which will need to implement rsample (

botorch/botorch/posteriors/posterior.py

Line 55 in 0d66aa0

def rsample(

). Then in your custom Model subclass (e.g. DropoutModel), construct and return the DropoutPosterior object in the DropoutModel.posterior method. Does that make sense?

7 replies

kstone40 Aug 4, 2024
Author

Got to this about a million years too late, but yes this worked. I was able to use the TorchPosterior object actually.

So, I just sample from the DNN in .posterior() and fit to a Normal dist. So, I just used some fixed number (100) sample from the dropout to do this, which is the independent of Posterior.rsample()

It feels a little inferior though. Id prefer to create a posterior that takes samples directly based off the dropout mask, may or may not be normal.

Just, I'm not sure I see how to create a Posterior object which is dependent on the DNN model object without circular references.

I can generate the samples from Model. Is there a way to thus return a Posterior without fitting or assuming a distribution?

Fine if it has to be normal honestly, that's not a huge deal at this point

Balandat Aug 4, 2024
Collaborator

@kstone40 have you taken a look at the EnsemblePosterior? This may be directly applicable in your case. See #1064 for more discussion behind the motivation for this.

kstone40 Aug 4, 2024
Author

Hi @Balandat thank you for the response!

I did take a look... But I might have confused myself. I was worried about the number of samples generated.

When I init EnsemblePosterior, should I just simulate a very large number of samples?

It does not appear there is error handling for if rsample's sample_shape is longer than values. If that is just limited by optim_samples during optimization, I can handle the errors that way.

Balandat Aug 4, 2024
Collaborator

When I init EnsemblePosterior, should I just simulate a very large number of samples?

Yeah that's the idea. More is better for accuracy, you ofc need to trade this off with compute / memory.

It does not appear there is error handling for if rsample's sample_shape is longer than values. If that is just limited by optim_samples during optimization, I can handle the errors that way.

The way rsample is implemented is that it can generate repeated draws from the finite samples, see the logic here.

Answer selected by kstone40

kstone40 Aug 4, 2024
Author

Aha, that's clever then. Makes sense, thank you!

kstone40 Aug 5, 2024
Author

Hi @Balandat - if you can help me out once again, now I am seeing that these models do not optimize well for q>1. Most of the time, it selects the same point for how many ever batch points were requested.

Some details:

I am using ModelList to wrap my DNN
The optimizer throws a warning about "cache_root" being set to False which I suspect might be related

Balandat Aug 17, 2024
Collaborator

Sorry for the delayed response here.

I am seeing that these models do not optimize well for q>1

Do your models actually provide a joint distribution over the q points if you evaluate them together? Or do you effectively just end up sampling from the marginal distribution of the points? In that case, it would not surprising that you'd get the same results for each point in the batch b/c they are essentially independent optimization problems).

kstone40 · 2024-09-07T03:28:53Z

kstone40
Sep 7, 2024
Author

Hi @Balandat ,

Thanks for following up! I have been traveling lately myself, so I'm also getting back to this late.

My implementation is actually in a recently published repository here: https://github.com/MSDLLCpapers/obsidian/blob/main/obsidian/surrogates/custom_torch.py

Since my posterior method accepts X.shape[1] > 1, I believe that q>1 would mean that I am technically evaluating the joint distribution. However, I believe they will be independent anyway based on my understanding of the monte-carlo dropout NN method. I think based on this, I should always enforce optim_sequential=True.

Anyway, maybe I am making an improper assumption. I though that for q>1 for custom models, BoTorch automatically would calculate fantasy models when optim_sequential=True. But even when I toggle this, I end up with identical experiments. If, separately, I fit a fantasy model and re-optimize, I get a different result. Do I need to manage this myself in my custom model implementation?

4 replies

Balandat Sep 8, 2024
Collaborator

Since my posterior method accepts X.shape[1] > 1, I believe that q>1 would mean that I am technically evaluating the joint distribution. However, I believe they will be independent anyway based on my understanding of the monte-carlo dropout NN method. I think based on this, I should always enforce optim_sequential=True.

Yes if the model itself is not able to produce nontrivial joint distributions for inputs of sets with more than q>1 points then you'll have to re-fit the model to achieve this kind of batch generation behavior.

for q>1 for custom models, BoTorch automatically would calculate fantasy models when optim_sequential=True

We don't explicitly construct fantasy models in general. The assumption we make is that if we provide a test point X of shape (batch_shape) x q x d to the model, it computes the joint distribution over the q points (for each batch element if a batch shape is present). If optim_sequential=False we then just optimize jointly over the q inputs. If optim_sequential=True we optimize sequentially, but for the k-th step we pass in the previous k-1 generated points as part of the test points but only optimize the acquisition function over the k-th input. This requires the acquisition function to support handling of pending points via X_pending:

botorch/botorch/optim/optimize.py

Lines 236 to 240 in 16853b4

    
           new_inputs.acq_function.set_X_pending( 
        
               torch.cat([base_X_pending, candidates], dim=-2) 
        
               if base_X_pending is not None 
        
               else candidates 
        
           )

- generally that uses the fact that the model can generate joint distributions.

If your model is not able to do that, then I think you have two options:

Do the generation manually - just generate for the first point, then refit your model, then run optimize_acqf again (on that manually generated "fantasy model"of yours).
Write your own acquisition function that essentially does 1. above but wraps this in a custom support of X_pending that modifies the model under the hood.

kstone40 Sep 12, 2024
Author

Thanks for the detailed reply. This is helpful.

I know how to manipulate X_pending in the aq function per your 2nd suggestion, but ideally I have a method to use this custom model for all aq functions without requiring a special method for each. So, I'm looking for a general approach.

If see there is a subclass "FantasizeMixIn" with methods fantasize() and condition_on_observations(). Is there a way to implement this in a more general way to solve my problem, within botorch? Otherwise, it seems like these methods will at least be helpful to write some conditions to handle it outside of botorch.

https://botorch.org/api/_modules/botorch/models/model.html#FantasizeMixin

Thanks again for all your help!

Balandat Sep 14, 2024
Collaborator

Yes these methods could in principle be used to implement support for pending points more generally. Basically, the way this could work would be to give models some property indicating whether they allow for computing joint posterior across a batch of q>1 points, and if not an acquisition function (if there are pending points and q=1, including in the sequential greedy optimization) could fall back on (i) calling fantasize() on the model, (ii) compute the acquisition function [in t-batch mode] on each of the fantasized models, and (iii) averaging the respective acquisition function values. Having this as a general setup that would permeate all (or most) acquisition functions would probably be a nontrivial amount of work though.

kstone40 Sep 14, 2024
Author

OK this is great! I appreciate your guidance. I actually see how I could implement that relatively easily.

Two residual questions related:

Would sequential greedy optimization ever lead to distinct selections for q>1 if the nontrivial joint distribution cannot be calculated? This is why you indicated my model would just choose the best point q times, right?
Are there any examples of custom models that can't implement joint distributions in the docs? I would think that would not be uncommon (outside of GPs)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Model, Dropout NN Posterior #2127

{{title}}

Replies: 2 comments 11 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Custom Model, Dropout NN Posterior #2127

kstone40 Nov 22, 2023

Replies: 2 comments · 11 replies

sdaulton Nov 22, 2023 Collaborator

kstone40 Aug 4, 2024 Author

Balandat Aug 4, 2024 Collaborator

kstone40 Aug 4, 2024 Author

Balandat Aug 4, 2024 Collaborator

kstone40 Aug 4, 2024 Author

kstone40 Aug 5, 2024 Author

Balandat Aug 17, 2024 Collaborator

kstone40 Sep 7, 2024 Author

Balandat Sep 8, 2024 Collaborator

kstone40 Sep 12, 2024 Author

Balandat Sep 14, 2024 Collaborator

kstone40 Sep 14, 2024 Author

kstone40
Nov 22, 2023

Replies: 2 comments 11 replies

sdaulton
Nov 22, 2023
Collaborator

kstone40 Aug 4, 2024
Author

Balandat Aug 4, 2024
Collaborator

kstone40 Aug 4, 2024
Author

Balandat Aug 4, 2024
Collaborator

kstone40 Aug 4, 2024
Author

kstone40 Aug 5, 2024
Author

Balandat Aug 17, 2024
Collaborator

kstone40
Sep 7, 2024
Author

Balandat Sep 8, 2024
Collaborator

kstone40 Sep 12, 2024
Author

Balandat Sep 14, 2024
Collaborator

kstone40 Sep 14, 2024
Author