-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase max number of dimensions from 16 to 21 #95340
Increase max number of dimensions from 16 to 21 #95340
Conversation
Pinging @elastic/es-analytics-geo (Team:Analytics) |
@@ -67,7 +67,7 @@ public class TimeSeriesIdFieldMapper extends MetadataFieldMapper { | |||
* comfortable given that dimensions are typically going to be less than a | |||
* hundred bytes each, but we're being paranoid here. | |||
*/ | |||
private static final int DIMENSION_VALUE_LIMIT = 1024; | |||
private static final int DIMENSION_VALUE_LIMIT = 1536; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can increase this constant? The limit per field is DIMENSION_VALUE_LIMIT + DIMENSION_NAME_LIMIT. Otherwise we may spend up to ~2kb per field and then we hit the 32kb limit with less then 21 fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I missed that...the tsid includes both the name and the value. Which means the right way of testing the limit iis to have a field name of 512 bytes and a value that is 1024 with 21 (test ok) or 22 (test ko) dimensions.
The tsid encodes both the field name and the field value. As a result, 1536 bytes is the sum of the field name max size and the field value max size, respectively 512 and 1024.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I left 2 questions.
int dimensionFieldLimit = 21; | ||
final Exception ex = expectThrows(IllegalArgumentException.class, () -> createTimeSeriesIndex(mapping -> { | ||
mapping.startObject("routing_field").field("type", "keyword").field("time_series_dimension", true).endObject(); | ||
for (int i = 0; i < dimensionFieldLimit; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm reading this incorrectly. This creates a mapping with 21 dimension fields. But this should succeed? Only if there are 22 dimensions this should fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is the routing_field that is a dimension so it is actually 22. I need to include something in the routing path otherwise the test fails because there is no routing path.
.putList(IndexMetadata.INDEX_ROUTING_PATH.getKey(), routingPaths.get()) | ||
.put(IndexSettings.TIME_SERIES_START_TIME.getKey(), "2000-01-08T23:40:53.384Z") | ||
.put(IndexSettings.TIME_SERIES_END_TIME.getKey(), "2106-01-08T23:40:53.384Z") | ||
.put(MapperService.INDEX_MAPPING_DIMENSION_FIELDS_LIMIT_SETTING.getKey(), dimensionsFieldLimit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we always concrete set the limit. Maybe also test with the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one minor comment otherwise LGTM
.put(IndexSettings.TIME_SERIES_START_TIME.getKey(), "2000-01-08T23:40:53.384Z") | ||
.put(IndexSettings.TIME_SERIES_END_TIME.getKey(), "2106-01-08T23:40:53.384Z"); | ||
|
||
if (dimensionsFieldLimit > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe make dimensionsFieldLimit
java.lang.Integer
and have a test which passes dow null?
Here we increase the maximum number of dimension fields
for a time series index from 16 to 21. The maximum size of each.
The maximum allowed size for a field in Lucene is 32 Kb.
When encoding the tsid we include all dimension field names and
all dimension field values. As a result we have, max field name 512
bytes, max field value 1024 bytes. 32 Kb / (512 + 1024) bytes = 21.3.
So the maximum number of dimensions is 21.
See #93564