Skip to content

Conversation

FrankWanger
Copy link
Contributor

Motivation

There is no current tutorial on using the probability pulled from classification results as constraints in acquisition functions. And such an application holds strong interest from BO guided laboratory experimentation. A prior discussion (#725) was formed.

Have you read the Contributing Guidelines on pull requests?

Yes

Test Plan

In the present tutorial we show how to deal with feasibility constraints that are observed alongside the optimization process (referred to as 'outcome constriants' in BoTorch document, or sometimes as 'black-box constraints'). More specifically, the feasibility is modelled by a classification model, followed by feeding this learned probability to the acquisition funtion through the constraint argument in SampleReducingMCAcquisitionFunction. Namely, this is achieved through re-weighting the acquisition function by $\alpha_{\text{acqf-con}}=\mathbb{P}(\text{Constraint satisfied})*\alpha_{\text{acqf}}$. To achieve this, the pulled probability of classification model underwent un-sigmoid function and was inversed to fit into the API (as negative values treated as feasibility).

A 2D syntheic problem of Townsend function was used. For the classification model, we implemented approximate GP with a Bernoulli likelihood. qLogExpectedImprovement was selected as the aquisition function.

Below are the plots of the problem landscape, acquisition function value, constraint probability, and the EI value (before weighting) at different iterations:

At iter=1:
image

At iter=10:
image

At iter=50:
image

The log regret after 50 iterations are plotted against random (sobel).
image

All images can be reproduced by the notebook.

Related PRs

not related to any change of functionality

@facebook-github-bot
Copy link
Contributor

Hi @FrankWanger!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Jan 26, 2025
@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

Copy link
Contributor

@Balandat Balandat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for putting this up, this is great.

My main comment (see inline) is on how to leverage the probability of feasibility produced by the classification model directly, rather than converting it twice, but that would require some changes to botorch itself.

Other than that mostly cosmetic comments.

I see that the method finds what appears to be the optimum very quickly - is this consistent across runs? If so it may make sense to reduce the number of iterations somewhat to cut down the runtime of the tutorial.

"$$ \n",
"where $t = \\arctan\\left(\\frac{x_1}{x_2}\\right)$\n",
"\n",
"Here, we follow a natural representation where $y_{\\text{con}}=1$ indicates a feasible condition. We will train a classification model to predict the feasibility of the point. Note that in BoTorch's implementation, **negative values** indicate feasibility, thus we need to do conversion later when feeding feasibility into the pipeline.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Here, we follow a natural representation where $y_{\\text{con}}=1$ indicates a feasible condition. We will train a classification model to predict the feasibility of the point. Note that in BoTorch's implementation, **negative values** indicate feasibility, thus we need to do conversion later when feeding feasibility into the pipeline.\n",
"Here, we follow a natural representation where $y_{\\text{con}}=1$ indicates a feasible condition. We will train a classification model to predict the feasibility of the point. Note that in BoTorch's implementation, **negative values** indicate feasibility, thus we need to do conversion later when feeding feasibility into the pipeline.\n",
"Note that we essentially 'throw away' information contained in the value of $y_{\\text{con}}$ by applying a binary mask - this is for illustration purposes as part of this tutorial, in a real-world application we would model the numerical value of $y_{\\text{con}}$ direction and apply the constraint $y_{\\text{con}}>01$ as part of the optimization.\n",

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit confusing here that $y_{\text{con}}$ is being used both in defining the numerical value of the constraint, as well as the binary mask value in the classification model. I suggest using different notation for this to avoid confusing the reader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed I have realised the problem of notation. I wanted to add that in many situations in experiments the numerical value of the constraint is not directly observable. So what we have as the data is only binary outcomes of success or failure - and yes here we applied this binary mask to our synthetic problem to throw away information so that we can simulate what we obtain in lab.

Comment on lines +352 to +366
"def pass_con_unsigmoid(Z, model_con, X=None):\n",
" '''\n",
" pass the constraint to the acquisition function\n",
"\n",
" Note: Botorch does sigmoid transformation for the constraint by default, \n",
" therefore we need to unsigmoid our probability (0-1) to (-inf,inf)\n",
" also we need to invert the probability, where -inf means the constraint is satisfied. Finally,we add 1e-8 to avoid log(0).\n",
" '''\n",
" y_con = Z[...,1] #get the constraint\n",
"\n",
" prob = model_con.likelihood(y_con).probs #obtain the probability of y_con(when constraint satisfied)\n",
" prob_unsigmoid_neg = torch.log(1-prob+1e-8)-torch.log(prob+1e-8) #unsigmoid the probability and invert it to adapt to BoTorch's constraint API\n",
" \n",
" return prob_unsigmoid_neg\n"
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the classification model already produces the probabilities of feasibility, it would be great if we could directly use that in the acquisition function, rather than converting it back first. @SebastianAment do you see any major challenges to just accept an additional "probability_of_feasibility" argument to SampleReducingMCAcquisitionFunction (and possibly in other places) and then just use that in the probability weighting?

Even if there are no issues, getting such a change into botorch would require some eng work so I wouldn't want to block this PR on that. That said, the probability of feasibility conversion is not a standard sigmoid though internally, see https://github.com/pytorch/botorch/blob/main/botorch/utils/objective.py#L178 - ideally for the time being (until we can accept the probability directly) we could apply the actual inverse of what is being applied in botorch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you see any major challenges to just accept an additional "probability_of_feasibility" argument to SampleReducingMCAcquisitionFunction (and possibly in other places) and then just use that in the probability weighting?

That should be pretty straightforward, mainly taking care of appropriate reshaping, since we are usually applying the feasibility weighting on a per-sample basis, and probability_of_feasibility won't share the MC dimension.

Regarding the inversion of the sigmoid, we are currently using a sigmoid with inverse quadratic asymptotic behavior, which could likely be inverted analytically as well, but that will not be necessary once we support this in the acquisition function directly.

FrankWanger and others added 7 commits January 27, 2025 17:10
Co-authored-by: Max Balandat <Balandat@users.noreply.github.com>
Co-authored-by: Max Balandat <Balandat@users.noreply.github.com>
Co-authored-by: Max Balandat <Balandat@users.noreply.github.com>
Co-authored-by: Max Balandat <Balandat@users.noreply.github.com>
Co-authored-by: Max Balandat <Balandat@users.noreply.github.com>
@FrankWanger
Copy link
Contributor Author

Thanks a lot for putting this up, this is great.

My main comment (see inline) is on how to leverage the probability of feasibility produced by the classification model directly, rather than converting it twice, but that would require some changes to botorch itself.

Other than that mostly cosmetic comments.

I see that the method finds what appears to be the optimum very quickly - is this consistent across runs? If so it may make sense to reduce the number of iterations somewhat to cut down the runtime of the tutorial.

Thank you so much! I've addressed most of the formatting issues, there is only one that I am not sure how to remove - the KeOps warnings. I've switched to macOS and it did not help. In terms of the results, yes I can see that it is quite consistent so I have halved the iterations to 25 and slighted added the freq of plots.

@Balandat
Copy link
Contributor

Great. I may just manually strip the output from the notebook source to keep it clean.

I'll get this merged in since it's in great shape already, but still curious to hear @SebastianAment's thoughts on supporting this better in the acquisition functions themselves (which would be a separate PR anyway).

@facebook-github-bot
Copy link
Contributor

@Balandat has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link

codecov bot commented Jan 28, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.98%. Comparing base (2144440) to head (e13322d).
Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2700   +/-   ##
=======================================
  Coverage   99.98%   99.98%           
=======================================
  Files         202      202           
  Lines       18588    18588           
=======================================
  Hits        18586    18586           
  Misses          2        2           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@facebook-github-bot
Copy link
Contributor

@Balandat merged this pull request in aeda83a.

facebook-github-bot pushed a commit that referenced this pull request Apr 3, 2025
…isition Functions (#2776)

Summary:
<!--
Thank you for sending the PR! We appreciate you spending the time to make BoTorch better.

Help us understand your motivation by explaining why you decided to make this change.

You can learn more about contributing to BoTorch here: https://github.com/pytorch/botorch/blob/main/CONTRIBUTING.md
-->

## Motivation

Using classifiers as output constraints in MC based acquisition functions is a topic discussed at least in #725 and in #2700. The current solution is to take the probabilities from the classifier and to unsigmoid them. This is a very unintuitive approach, especially as sometimes a sigmoid and sometimes a fatmoid is used. This PR introduces a new attribute for `SampleReducingMCAcquisitionFunction`s named `probabilities_of_feasibility` that expects  a callable  that returns a tensor holding values between zero and one, where one means feasible and zero infeasible.

Currently, it is only implemented in the abstract `SampleReducingMCAcquisitionFunction` using the additional attribute. As the `constraints` argument is just a special case of the `probabilities_of_feasibility` argument, where the output of the callable is not directly applied to the objective but further processed by a sigmoid or fatmoid one could also think about uniting both functionalities into one argument, and modify `fat` to `List[bool | None] | bool` that indicates if a fatmoid, a sigmoid or nothing is applied. When the user just  provides a bool, it applies either a fatmoid or sigmoid for all. This approach would also have the advantage that only `compute_smoothed_feasibility_indicator` needs to be modified and almost nothing for the individual acqfs (besides updating the types for `constraints`.) Furthermore, it follows the approach that we took when we implemented individual `eta`s for the constraints. So I would favor this one in contrast to the one actually outlined in the code ;) I am looking forward to you ideas on this.

SebastianAment: In #2700, you mention that from your perspective the `probabilities_of_feasibility` would not be applied on a per sample basis as the regular constraints. Why? Also in the community notebook by FrankWanger using the unsigmoid trick it is applied on a per sample basis. I would keep it on the per sample basis and if a classifier for some reason do not returns the probabilities on a per sample basis, it would be the task of the user to expand the tensor accordingly. What do you think?

### Have you read the [Contributing Guidelines on pull requests](https://github.com/pytorch/botorch/blob/main/CONTRIBUTING.md#pull-requests)?

Yes.

Pull Request resolved: #2776

Test Plan:
Unit test, but not yet added due to the draft status and pending architecture choices.

cc: Balandat

Reviewed By: saitcakmak

Differential Revision: D72342434

Pulled By: Balandat

fbshipit-source-id: 6fe6d7201d1a9388dde90e0a46f087f06dba958a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants