[Bug] Inconsistent behavior: standardize
vs. Standardize
with n < 2
#2422
Replies: 9 comments
-
This isn't supported as of BoTorch 0.11.1; We should update the documentation to avoid references to As for behavior with empty data, the GP will give the same posterior distribution at any point, so an acquisition function like UCB will have the same value at every point, and To avoid relying on a model whose fit is likely poor, and to ensure diversity, It's common to generate the first five or so candidates using Sobol quasi-random points, e.g. with |
Beta Was this translation helpful? Give feedback.
-
Hi @esantorella, thanks for the quick answer ✌🏼. I'm aware of the conceptual problems that arise when working with empty training data, and I'm also clear about the difference between the two scalers, but let me reply point by point:
Thanks for the info, I now also saw your PR that implemented these changes. I think if you plan to discourage users from using
I think I don't completely agree with this statement. While this is true for most situations and, in the average use case, people are probably better off resorting to alternatives (like Sobol sampling), your claim does not hold in all scenarios:
What I'm trying to say here: I do get your point, but because of the way standardization is handled, it's currently impossible to create a single recommendation model that behaves consistently (i.e. applies the exact same logic) across all training data sizes. That is, is you created a plot of the "performance" of that model where on the x-axis you have the number of training data points, you would not be able to get a smooth curve that fully extends to 0 on its left end, because that case currently requires applying a different model logic – even though conceptually the same logic could be applied, as my examples above demonstrate. Perhaps I'm overlooking an important aspect here, so please feel free to correct me if I'm mistaken 😃 |
Beta Was this translation helpful? Give feedback.
-
That all sounds right to me, with the proviso that you'd need to use an acquisition function that supports There are two approaches we could go with here:
Would it serve your use case to just not use a transform when data is empty? |
Beta Was this translation helpful? Give feedback.
-
Simply not transforming in case of empty data would certainly work and would have been my manual workaround for it 👍🏼 However, I see a discrepancy between my expectation and the current implementation, and I think the current logic of To explain what I mean, let us consider the two scenarios that can happen:
The current version applies a manual fix for Long story short: to me it would seem much more consistent to either say "nope, not applying any custom logic here" for all cases |
Beta Was this translation helpful? Give feedback.
-
I agree with this, and found it confusing to understand what was happening in the |
Beta Was this translation helpful? Give feedback.
-
On second thought, I think On the other hand, the second model seems highly overconfident in the presence of such large noise, so setting the standard deviation to 1 may be a bad choice in the presence of observed noise. It might be more sensible to set it to We could also conclude that we are not going to get reasonable predictions with 0 or 1 data points and disallow it entirely. |
Beta Was this translation helpful? Give feedback.
-
Hi @esantorella, thank for creating the examples. Before we continue or discussion on whether or not to allow the degenerate cases, let's perhaps first figure out what's going wrong here! First of all, thanks for bringing up the |
Beta Was this translation helpful? Give feedback.
-
Yeah good question. I'm not going to have a chance to look closely into this right away, but my best guess would be that failing to standardize leads to difficulty with model fit, similar to #2392. The prior probability of having a data point at 1000 is very low, especially since the provided yvar nearly rules out the possibility that this is noise. So the marginal likelihood might be very flat, and tiny, near the optimum, causing numerical convergence troubles. Just a guess though! |
Beta Was this translation helpful? Give feedback.
-
I opened #2421 for removing references to |
Beta Was this translation helpful? Give feedback.
-
🐛 Bug
I am currently playing around with situations where there's no training data available yet and noticed that the behaviors of
utils.standardize
andtransforms.Standardize
are inconsistent.To reproduce
Here a minimal example adopted from the code on the landing page. When you run the original code for the GP creation (in the comments), everything works fine. However, when you run the displayed version, you get the error shown below.
Code snippet to reproduce
Stack trace/error message
Expected Behavior
In both cases, the GP posterior should simply match the GP posterior and the standardization applied along the way should not mess with the computation. I haven't checked what exact logic is applied internally when there is no data / only one data point being passed to the transformation, but my intuition would tell me that what should happen is:
System information
Please complete the following information:
Beta Was this translation helpful? Give feedback.
All reactions