Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase max number of dimensions from 16 to 21 #95340

Conversation

salvatore-campagna
Copy link
Contributor

@salvatore-campagna salvatore-campagna commented Apr 18, 2023

Here we increase the maximum number of dimension fields
for a time series index from 16 to 21. The maximum size of each.
The maximum allowed size for a field in Lucene is 32 Kb.

When encoding the tsid we include all dimension field names and
all dimension field values. As a result we have, max field name 512
bytes, max field value 1024 bytes. 32 Kb / (512 + 1024) bytes = 21.3.
So the maximum number of dimensions is 21.

See #93564

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v8.8.0 labels Apr 18, 2023
@salvatore-campagna salvatore-campagna added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :StorageEngine/TSDB You know, for Metrics and removed needs:triage Requires assignment of a team area label labels Apr 18, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@@ -67,7 +67,7 @@ public class TimeSeriesIdFieldMapper extends MetadataFieldMapper {
* comfortable given that dimensions are typically going to be less than a
* hundred bytes each, but we're being paranoid here.
*/
private static final int DIMENSION_VALUE_LIMIT = 1024;
private static final int DIMENSION_VALUE_LIMIT = 1536;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can increase this constant? The limit per field is DIMENSION_VALUE_LIMIT + DIMENSION_NAME_LIMIT. Otherwise we may spend up to ~2kb per field and then we hit the 32kb limit with less then 21 fields?

Copy link
Contributor Author

@salvatore-campagna salvatore-campagna Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I missed that...the tsid includes both the name and the value. Which means the right way of testing the limit iis to have a field name of 512 bytes and a value that is 1024 with 21 (test ok) or 22 (test ko) dimensions.

The tsid encodes both the field name and the field value. As a result,
1536 bytes is the sum of the field name max size and the field value
max size, respectively 512 and 1024.
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I left 2 questions.

int dimensionFieldLimit = 21;
final Exception ex = expectThrows(IllegalArgumentException.class, () -> createTimeSeriesIndex(mapping -> {
mapping.startObject("routing_field").field("type", "keyword").field("time_series_dimension", true).endObject();
for (int i = 0; i < dimensionFieldLimit; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm reading this incorrectly. This creates a mapping with 21 dimension fields. But this should succeed? Only if there are 22 dimensions this should fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is the routing_field that is a dimension so it is actually 22. I need to include something in the routing path otherwise the test fails because there is no routing path.

.putList(IndexMetadata.INDEX_ROUTING_PATH.getKey(), routingPaths.get())
.put(IndexSettings.TIME_SERIES_START_TIME.getKey(), "2000-01-08T23:40:53.384Z")
.put(IndexSettings.TIME_SERIES_END_TIME.getKey(), "2106-01-08T23:40:53.384Z")
.put(MapperService.INDEX_MAPPING_DIMENSION_FIELDS_LIMIT_SETTING.getKey(), dimensionsFieldLimit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we always concrete set the limit. Maybe also test with the default?

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one minor comment otherwise LGTM

.put(IndexSettings.TIME_SERIES_START_TIME.getKey(), "2000-01-08T23:40:53.384Z")
.put(IndexSettings.TIME_SERIES_END_TIME.getKey(), "2106-01-08T23:40:53.384Z");

if (dimensionsFieldLimit > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make dimensionsFieldLimit java.lang.Integer and have a test which passes dow null?

@salvatore-campagna salvatore-campagna merged commit 580e8a6 into elastic:main Apr 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.8.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants