Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default LogsDB value for ignore_dynamic_beyond_limit #115265

Conversation

salvatore-campagna
Copy link
Contributor

@salvatore-campagna salvatore-campagna commented Oct 21, 2024

When ingesting logs, it's important to ensure that documents are not dropped due to mapping issues, also when dealing with dynamically mapped fields. Elasticsearch provides two key settings that help manage the total number of field mappings and handle situations where this limit might be exceeded:

  1. index.mapping.total_fields.limit: This setting defines the maximum number of fields allowed in an index. If this limit is reached, any further mapped fields would cause indexing to fail.

  2. index.mapping.total_fields.ignore_dynamic_beyond_limit: This setting determines whether Elasticsearch should ignore any dynamically mapped fields that exceed the limit defined by index.mapping.total_fields.limit. If set to false, indexing will fail once the limit is surpassed. However, if set to true, Elasticsearch will continue indexing the document but will silently ignore any additional dynamically mapped fields beyond the limit.

To prevent indexing failures due to dynamic mapping issues, especially in logs where the schema might change frequently, we change the default value of index.mapping.total_fields.ignore_dynamic_beyond_limit from false to true in LogsDB. This change ensures that even when the number of dynamically mapped fields exceeds the set limit, documents will still be indexed, and additional fields will simply be ignored rather than causing an indexing failure.

This adjustment is important for LogsDB, where dynamically mapped fields may be common, and we want to make sure to avoid documents from being dropped.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@salvatore-campagna salvatore-campagna marked this pull request as draft October 21, 2024 17:09
@salvatore-campagna
Copy link
Contributor Author

@elasticmachine update branch

@elasticmachine
Copy link
Collaborator

There are no new commits on the base branch.

@salvatore-campagna
Copy link
Contributor Author

@elasticmachine update branch

@elasticmachine
Copy link
Collaborator

There are no new commits on the base branch.

@salvatore-campagna
Copy link
Contributor Author

@elasticmachine update branch

@salvatore-campagna salvatore-campagna changed the title Default LogsDB value for index.mapping.total_fields.ignore_dynamic_beyond_limit Default LogsDB value for ignore_dynamic_beyond_limit Oct 22, 2024
@salvatore-campagna salvatore-campagna marked this pull request as ready for review October 22, 2024 14:30
@salvatore-campagna salvatore-campagna marked this pull request as draft October 22, 2024 14:32
@salvatore-campagna salvatore-campagna marked this pull request as ready for review October 22, 2024 14:41
public static final IndexVersion UPGRADE_TO_LUCENE_10_0_0 = def(9_000_00_0, Version.LUCENE_10_0_0);

public static final IndexVersion LOGSDB_DEFAULT_IGNORE_DYNAMIC_BEYOND_LIMIT = def(9_000_00_1, Version.LUCENE_10_0_0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if this is okay to backport? I suspect there could be problems when backporting to 8.16.

Copy link
Contributor Author

@salvatore-campagna salvatore-campagna Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I see it is

  • No issue for users who have explicitly set the value (whether false or true) since that overrides the default.

  • Change in behavior for users relying on the default false: After upgrading, documents that would have been
    rejected (due to exceeding the field limit) will now be indexed with extra fields ignored only for LogsDB anyway.

While this change doesn't technically "break" anything, it may surprise users who were relying on documents being rejected in case of mapping issues. Moving from default false to default true is eventually going to accept more documents which would have been rejected before. So at least by changing this we are not "throwing more errors" but likely we will have less indexing errors.

I am fine backporting only to 8.17 anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your reasoning and it makes sense. I suspect that just technically this may be hard to backport even to 8.17. F.e. UPGRADE_TO_LUCENE_10_0_0 does not exist in 8.x. We may need guidance from someone on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on recent version in this file, shouldn't new version be: 9_001_00_0? Since we're are not creating a patch version?

When back porting, I don't think the same version id can be used. Based on the latest index version in 8.x, I believe this should be: 8_518_00_0 . However after reading the comment in IndexVersions class, I don't fully understand how the main branch should updated after the backport has merged. I think a new index version constant needs to be added with version id 8_518_00_0 ? We have then two versions, for stateful and stateless. But I may be wrong here. @thecoop would you be able to give advice here what index versioning and back porting now that main uses a different major index version?

Additionally I also think we don't need to backport to this 8.16 branch.

Copy link
Contributor Author

@salvatore-campagna salvatore-campagna Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect that when backporting this we have a compiler failure because the index version I added here uses a Lucene version that does not exist in the other branch. In that case I expect for the backport I just need to add the appropriate new version (the next one). Something like 8_518_00_0 or a newer one if this one is already created by another backport.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When backporting, you need to introduce a new version constant 8_518_00_0 which represents 'this change, but with lucene 9' as a backport, and checks on the 8.x branch need to use this constant like indexVersion >= BACKPORT_CONSTANT

You then need to also understand that new constant on main - so when you're checking whether this new functionality is present, the check needs to be something like indexVersion.between(BACKPORT_VERSION, 9_000_00_0) || indexVersion >= MAIN_CONSTANT

@salvatore-campagna
Copy link
Contributor Author

@elasticmachine update branch

@salvatore-campagna
Copy link
Contributor Author

@elasticmachine update branch

@@ -176,7 +176,7 @@ private void forEncode(int token, int tokenBits, long[] in, DataOutput out) thro
* Encode the given longs using a combination of delta-coding, GCD factorization and bit packing.
*/
void encode(long[] in, DataOutput out) throws IOException {
assert in.length == numericBlockSize;
// assert in.length == numericBlockSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these asserts commented out?

Copy link
Contributor Author

@salvatore-campagna salvatore-campagna Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mistake in merging main probably. This file is not expected to be touched here.

@salvatore-campagna
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@salvatore-campagna
Copy link
Contributor Author

@elasticmachine update branch

@salvatore-campagna
Copy link
Contributor Author

@elasticmachine update branch

@salvatore-campagna
Copy link
Contributor Author

@elasticmachine update branch

@elasticmachine
Copy link
Collaborator

There are no new commits on the base branch.

@salvatore-campagna
Copy link
Contributor Author

@elasticmachine update branch

@elasticmachine
Copy link
Collaborator

There are no new commits on the base branch.

@salvatore-campagna
Copy link
Contributor Author

@elasticmachine update branch

@elasticmachine
Copy link
Collaborator

There are no new commits on the base branch.

IndexVersions.UPGRADE_TO_LUCENE_10_0_0
) || indexVersionCreated.onOrAfter(IndexVersions.LOGSDB_DEFAULT_IGNORE_DYNAMIC_BEYOND_LIMIT);
return String.valueOf(isLogsDBIndexMode && isNewIndexVersion);
},
Copy link
Contributor Author

@salvatore-campagna salvatore-campagna Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thecoop @martijnvg I update the check here to include a new version LOGSDB_DEFAULT_IGNORE_DYNAMIC_BEYOND_LIMIT_BACKPORT which uses Lucene 9.
When backporting this, I will just remove any reference to the one required in main LOGSDB_DEFAULT_IGNORE_DYNAMIC_BEYOND_LIMIT and just do a check on version being onOrAfter version LOGSDB_DEFAULT_IGNORE_DYNAMIC_BEYOND_LIMIT_BACKPORT.

Copy link
Member

@thecoop thecoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndexVersion LGTM

@salvatore-campagna
Copy link
Contributor Author

@elasticmachine update branch

@salvatore-campagna
Copy link
Contributor Author

@elasticmachine update branch

@salvatore-campagna salvatore-campagna merged commit 3cbbcc5 into elastic:main Oct 31, 2024
16 checks passed
jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request Nov 4, 2024
When ingesting logs, it's important to ensure that documents are not dropped due to mapping issues, also when dealing with dynamically mapped fields. Elasticsearch provides two key settings that help manage the total number of field mappings and handle situations where this limit might be exceeded:

1. **`index.mapping.total_fields.limit`**: This setting defines the maximum number of fields allowed in an index. If this limit is reached, any further mapped fields would cause indexing to fail.

2. **`index.mapping.total_fields.ignore_dynamic_beyond_limit`**: This setting determines whether Elasticsearch should ignore any dynamically mapped fields that exceed the limit defined by `index.mapping.total_fields.limit`. If set to `false`, indexing will fail once the limit is surpassed. However, if set to `true`, Elasticsearch will continue indexing the document but will silently ignore any additional dynamically mapped fields beyond the limit.

To prevent indexing failures due to dynamic mapping issues, especially in logs where the schema might change frequently, we change the default value of **`index.mapping.total_fields.ignore_dynamic_beyond_limit` from `false` to `true` in LogsDB**. This change ensures that even when the number of dynamically mapped fields exceeds the set limit, documents will still be indexed, and additional fields will simply be ignored rather than causing an indexing failure.

This adjustment is important for LogsDB, where dynamically mapped fields may be common, and we want to make sure to avoid documents from being dropped.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants