Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce a limit on the depth of the JSON object. #35063

Merged

Conversation

jtibshirani
Copy link
Contributor

@jtibshirani jtibshirani commented Oct 29, 2018

Currently, we error if a JSON field could contain fields deeper than the configured limit. The default value is 20, which matches index.mapping.depth.limit).

I was also wondering if it made sense to introduce a 'lenient' mode, in the general spirit of ignore_malformed. A couple options:

  • If the depth limit is exceeded, we ignore the entire JSON field and continue parsing the rest of the document.
  • We add fields that obey the depth limit, and only ignore those fields that too deeply nested.

@jtibshirani jtibshirani added >feature :Search Foundations/Mapping Index mappings, including merging and defining field types labels Oct 29, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@jtibshirani
Copy link
Contributor Author

@elasticmachine test this please

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@colings86
Copy link
Contributor

Couple of thoughts on this:

  1. Should we just use the index.mapping.depth.limit setting instead of having an explicit setting in the mapping? This would however require that we throw an error on hitting this limit as we do if we hit this limit elsewhere. The advantage I see is that there is consistency in how we deal with deep objects across the entire mapping.
  2. Assuming we do what a separate depth limit on these fields and we want to have an ignore_malformed style mode, I think it should ignore the entire object if the depth is exceeded as this seems more consistent with the spirit of the ignore_malformed setting.

@Mpdreamz
Copy link
Member

I like the spirit of this PR, allowing you to store complex object graphs without exploding the mappings.

I wonder if this usecase is best served through a dedicated mapping type instead?

e.g given a document:

{
    "name" : "x",
    "metadata" : {}
}

A user would be able to say map my metadata field as metadata type. Which will mean we will accept any arbitrary json to any depth but not index anything underneath it or create mappings for any of the sub fields?

@colings86
Copy link
Contributor

@Mpdreamz Note that this is already a dedicated field type, as this is for the queryable object fields feature: #25312

We do have a line of confusion here in that we keep talking about this feature as "object fields" which we should change since we already have the concept of "object fields" in Elasticsearch. We should try to come up with a better name for this new field

@Mpdreamz
Copy link
Member

Mpdreamz commented Oct 30, 2018

Totally missed #25312 is moving ahead! Awesome, really looking forward to this feature.

Agreed on naming anything other then object fields would do it (key_value, json field type, map fields, dictionary fields) 😄

@jtibshirani
Copy link
Contributor Author

jtibshirani commented Oct 30, 2018

@colings86 would you be able to give more detail on why a single mapping depth limit would be preferable? I don't have great context on what index.mapping.depth.limit is meant to protect against, and don't intuitively see why a single value would make sense for both cases. To help compare, the main motivation behind the depth_limit here is to (1) prevent us from creating extremely long keyed tokens of the form key1.key2.key3. ... .value, and (2) be more robust to pathological input.

@Mpdreamz thanks for keeping us on top of things :) I'm brainstorming a new name for these fields.

@colings86
Copy link
Contributor

@jtibshirani my thinking of keeping it as a single mapping depth limit was to simplify thing for the user so they only need to reason about the depth of the JSON in their documents and not have to worry about whether certain parts of the JSON are allowed to nested deeper than other parts.

I think the main reason for adding index.mapping.depth.limit was to help protect agains mapping explosion where the mapping for the index ends up being unwieldy and causes problems in the cluster. Considering this I can see an argument that the setting isn't actually appropriate for the queryable object fields since the mapping depth actually doesn't increase with this field type since anything nested in JSON under this field is included in this mapping field definition.

Given the above I'm happy for us to have separate settings if you still think thats the better way to go since there is a good argument for why index.mapping.depth.limit might not fit well into this new limit

@jtibshirani
Copy link
Contributor Author

jtibshirani commented Oct 31, 2018

@colings86 thanks for the explanation, in that case I do think having separate options makes sense here.

For now I'm planning to skip adding a 'lenient' mode, as I'd like to get more usage feedback before adding another config option. Merging, but let me know if you have other thoughts and I can address them in a follow-up.

@jtibshirani jtibshirani merged commit 74afb2a into elastic:object-fields Oct 31, 2018
@jtibshirani jtibshirani deleted the json-field-depth-limit branch October 31, 2018 18:52
jtibshirani added a commit to jtibshirani/elasticsearch that referenced this pull request Nov 8, 2018
jtibshirani added a commit to jtibshirani/elasticsearch that referenced this pull request Nov 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search Foundations/Mapping Index mappings, including merging and defining field types
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants