-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] categorize_text
aggregation can suffer array index out of bounds exception
#105836
Comments
Pinging @elastic/ml-core (Team:ML) |
droberts195
added a commit
to droberts195/elasticsearch
that referenced
this issue
Mar 5, 2024
Previously the `categorize_text` aggregation could throw an exception if nested as a sub-aggregation of another aggregation that produced empty buckets at the end of its results. This change avoids this possibility. Fixes elastic#105836
droberts195
added a commit
to droberts195/elasticsearch
that referenced
this issue
Mar 6, 2024
…astic#105987) Previously the `categorize_text` aggregation could throw an exception if nested as a sub-aggregation of another aggregation that produced empty buckets at the end of its results. This change avoids this possibility. Fixes elastic#105836
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The
categorize_text
aggregation can suffer from an array index out of bounds exception if it is nested beneath another aggregation and that other aggregation generates a large number of empty buckets at the end of its output range.A concrete example is
categorize_text
nested underdate_histogram
over a time range that causes thedate_histogram
to have many buckets withdoc_count
zero at the end of the time range, for example when the time range of the search extends beyond the end of the data.One way to reproduce the problem is as follows:
data
.The output is as follows:
The buggy line is:
elasticsearch/x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/aggs/categorization/CategorizeTextAggregator.java
Line 116 in f12070e
It is possible that one of the "ords to collect" doesn't actually have a categorizer, because there are no documents within it. If the empty bucket still has an array element then that's fine, as we handle
null
on the next line, but if it's beyond the end of the array then it causes an exception.This problem is not always seen with every search extending into empty buckets, because when a
BigArray
is resized it may be resized to a bigger size than requested, and the extra capacity may result in the array being big enough to have an entry for each bucket even though we didn't explicitly ask for this:elasticsearch/server/src/main/java/org/elasticsearch/common/util/BigArrays.java
Lines 861 to 869 in c9be8d9
The text was updated successfully, but these errors were encountered: