-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add size or number of fields limit on dynamic mapping #11443
Comments
Hi @yanjunh This feels like the wrong approach. Why would we index field I'm afraid the proposed solution is not a solution at all. Instead, you need to change how you're indexing the data, or how you're adding fields. |
I think the idea is to catch mistakes and prevent the cluster from becoming unstable. |
@nik9000 sure, but this is still the wrong solution. we're not going to add a setting which arbitrarily rejects fields if X many fields still exist. |
@clintongormley We have been telling developers about the limitation but we can't prevent them from producing unfriendly data. Thought about detecting it on when data flow through Logstash but that's not a good place. What's the recommended way to guard against unbounded mapping growth? |
Strong language. Seriously though, for my use case I turn mapping to non-dynamic and only add fields "intentionally" but that's not always possible. Also - I imagine this is something you could catch in code review and by giving all developers their own instance to test against. But I'm still for adding some extra defence against it to Elasticsearch. |
@nik9000 Forgive my wording |
@yanjunh I'll reopen the issue for further discussion. |
Going from I would be open to discuss switching from |
If dynamic:strict rejects the document then I'm +1 on that. So long as the setting is dynamically update-able then it shouldn't prevent people from doing this if they want - just get in their way and remind them its a bad idea. |
I agree with dynamic:strict. My original thought is to keep the good part of the document and just drop the bad field. Dropping whole document might be a better idea. That will force the data producers to structure their data in the right way. The data consumers can then effectively search and aggregate on the right structured data. In either approach, being able to protect the search cluster from bad data can be very useful. |
+1 for setting a limit on the number of (dynamic) fields, and moving to dynamic:false when that limit is reached. I'm also trying to protect elasticsearch from becoming unstable due to too many fields. |
We use a custom analyzer to mark dynamically added fields, which allows us to monitor the count of added fields. More explanation in the following blog post: https://breml.github.io/blog/2015/10/28/tagging-of-dynamically-added-fields-in-elasticsearch/ |
+1 on providing an option for the safeguard. Another use case is key value logging and using the LS KV filter with dynamic mappings so that these fields are automatically extracted and generated. Having a limit the user can set can help prevent unintentional explosion in mappings when someone decides to use UUIDs as keys, etc.. |
Object mapping are dynamic by default. The mapping size get exploded when incoming document stream contains dynamic keys such as uuid string, time string etc. For example, {"fields":{"de305d54-75b4-431b-adb2-eb6b9e546014":{...}}}. It can kill the cluster very fast at high indexing rate. We need a way to limit mapping by size or by number of fields. When the limit is reached, the object mapping behavior changes from dynamic:true to dynamic:false.
The text was updated successfully, but these errors were encountered: