Support flattened field type from Elasticsearch #25820

Bargs · 2018-11-16T18:28:42Z

A new object field type is coming to ES. I played around with the feature branch a bit today and collected some thoughts and findings. From what I've seen so far, there are some small updates to Kibana we'll definitely want to make and some things we should discuss.

Need to add JSON type to kibana (or whatever name the ES lands on for this field type). Currently it shows up in the index pattern as unknown.
Autocomplete on values doesn't work because the field is not aggregatable
Autocomplete on field names doesn’t work, because we don't know what sub-fields the object has
In KQL we implement wildcard field names ourselves based on the fields in the index pattern, so something like head*:application/json actually works but headers.con*:application/json does not because we don’t know about those fields. This could be pretty confusing to users
Filters can’t be created from the doc table (similar to our treatment of arrays of objects today)
Filters can't be created from the "Add Filter" UI in the filter bar (this is worse than no autocomplete in the query bar because we don’t allow free text input for the field name. Currently the field name must be in the index pattern)
Highlighting would be nice, however we don’t currently highlight values inside arrays of objects, so a lack highlighting in a JSON document won’t be totally surprising to current users.

I only spent about an hour with it so there may be more things I'm missing, would definitely be good to get more eyes on it.

Feature branch here if anyone else wants to check it out: https://github.com/elastic/elasticsearch/tree/object-fields

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-11-16T18:28:43Z

Pinging @elastic/kibana-app

jtibshirani · 2018-12-03T23:35:31Z

Thanks @Bargs for taking a look at the branch! I had a couple thoughts/ questions.

First, from talking to the Beats team, I think it would be valuable to add support for terms aggregations, both on the root field (like headers) and the keyed values (headers.content-type). This would also allow Kibana to support autocompletion of values when adding a filter. I am not completely sure this is possible to do in a performant way, but is something I am looking into.

Next, we had discussed if there was a way to expose the list of subfields/ keys that are available. I don't think it makes sense to return this as part of the mappings or field capabilities, because there may be a huge number distinct subfields (and the number of field mappings is assumed to be bounded to a reasonable number). A more sensible approach might be to index the subfield names into a separate lucene field, and allow for a terms aggregation that returns the most popular subfields. However, this doesn't fit perfectly with the current API around json fields, and would require storing additional information. The alternative would be to accept that we don't support autocomplete on these subfields, and when filtering allow for free text input on the field name. I'm curious as to your thoughts here.

Bargs · 2018-12-12T23:44:43Z

Sorry for the delay in response, I was out all last week.

From a technical standpoint I think storing the subfield names in an index and doing a terms agg on them would work for autocompleting the field names in Kibana. But while chatting with @lukasolson I realized even with the field names we would still be missing type information and as a result would not be able to intelligently suggest query types. If we go down the path of trying to make these feel like regular searchable fields for average users I think we need to go all the way, so we would need that type information too. I have a feeling that probably complicates things even more. If so, we might be able to do without autocomplete on json fields for the time being. The most important thing to me is that the autocomplete doesn't appear broken or unpredictable to a normal user, but I think we might be able to solve that be adding some warnings in the UI if the user is searching on a JSON field.

jtibshirani · 2019-03-29T00:54:50Z

I've now started to pick up work JSON fields again and have a couple updates. First, we worked out how to support for keyword-style aggregations like terms, so it should be possible for Kibana to autocomplete on the field values.

Second, after thinking about it more, I think it could be valuable to provide access to the possible keys in the JSON field. Even beyond Kibana, this seems generally useful as part of a search workflow on these fields: a client could first retrieve the common keys in the JSON field, present them to the user, then allow for searches on these keys. Otherwise these keys must be known in advance, or can only be discovered by encountering them within documents returned from another search.

To support this, we could index the keys into a separate field, and the common keys would then be retrieved through a terms aggregation. The API could look something like this:

json_field searches only field values
json_field.some_key searches only values belonging to the key some_key
json_field._keys searches only field keys (new option, not implemented currently)

@Bargs @jpountz @jimczi I was curious about your thoughts on the above. The downsides are that this API isn't as elegant, and it may involve indexing more information. Note that this relates to @jimczi's question here about whether keyed JSON fields should use the _field_names field: elastic/elasticsearch#40069 (comment)

Bargs · 2019-03-29T21:26:49Z

Sounds good to me! We'd love to have a way to retrieve a list of the keys, whatever form the API takes.

jpountz · 2019-04-01T13:53:04Z

How would this work in practice? Is my understanding correct that Kibana would first retrieve fields from the field capabilities APIs, and as a second step for each json field it would run a terms aggregation on the json_field._keys field to further populate the list of fields that can be searched or aggregated (probably with a reasonable value of terminate_after to keep performance ok)?

At first sight this sounds like a good idea to me, but I'd like to double check that we are ok with the complexity that it introduces in Kibana as well as the trade-off: because there is no upper bound on the number of fields and than making sure to collect all fields would be too slow anyway, most of the time we would only collect a subset of the sub fields that exist in a json object.

timroes · 2019-04-01T14:01:39Z

I would like to put a spot on something @Bargs mentioned earlier already. Just knowing the field names (via _keys) doesn't help us that much (or just in a couple of places). It would still not allow us to build e.g. any visualization on those fields, since we need to have the type information for those fields available to work with them. Without them having proper types we cannot add them properly into the index pattern and thus maybe just make smaller workaround solution for searches, but not really support them across Kibana. From a technical point, how are those types actually be treated inside Elasticsearch? Are all values just keyword types (in which case we could hardcode that too)?

jpountz · 2019-04-01T16:49:47Z

@timroes Yes they would behave almost exactly like a keyword field.

timroes · 2019-04-02T10:56:51Z

@jpountz I am a bit worried about the "almost" in that sentence :D Could you tell what are the actual differences? Because we need to know if we are able to simply treat them as "keyword" fields (but that would then apply to all places), or if we can't, in which case we would need to know about that type difference somehow.

jpountz · 2019-04-02T11:48:42Z

@timroes Here are the differences to my knowledge (please review @jtibshirani):

produced scores will be different because field statistics such as the document count for a field would be different,
min_doc_count=0 on terms aggregations is unsupported (we are hoping to address it in the near future).

Other than that, they should support the same set of queries and aggregations.

timroes · 2019-04-02T11:57:22Z

Okay that sounds fine to me. We don't mind too much about the score (especially not about specific values) and we're not using min_doc_count=0 on terms aggregations as far as I am aware (it anyway sounds like a weird parameter value to me).

So the plans here sound rather reasonable. I would just suggest that while creating that index pattern, we're giving the user a flag if there are any JSON fields contained, whether or not they want "to use those in Kibana", since it sounds to me, like we could potentially otherwise bloat the index pattern quiet much, and maybe users don't want to use them actually.

jtibshirani · 2019-04-02T23:53:38Z

@timroes In addition to the aggregation limitation that @jpountz mentioned (which we hope to address), I tried to list the restrictions here: https://github.com/elastic/elasticsearch/blob/object-fields/docs/reference/mapping/types/embedded-json.asciidoc#supported-operations. You'll notice that certain query types like regexp are not supported.

How would this work in practice? ... I'd like to double check that we are ok with the complexity that it introduces in Kibana as well as the trade-off.

I'm also hoping to understand this better, would it be possible to walk through how Kibana would load + display the keys, given the current proposal of running a terms aggregation on json_field._keys?

peterschretlen · 2019-06-06T13:51:03Z

If I understand the purpose of embedded_json, it lets us avoid really large mappings and field lists, when typically only a handful of fields are of interest (and some fields may be very sparsely populated).

Do we expect (can we assume) these fields are known ahead of time? Or do they need to be discovered via autocomplete (which may not even be possible to do well if the number of keys is large).

Say we expect only a handful of embedded fields are of interest, and that handful doesn't change much - these are two big assumptions - then how about defining the embedded json fields as part of the index pattern (similar to how we do now for scripted fields)? It would not be as nice to work with as automatically discovered fields via autocomplete, but then the field type can be defined and there can be as many or as few as you want.

elasticmachine · 2020-03-27T09:10:06Z

Pinging @elastic/kibana-app-arch (Team:AppArch)

andrewkcarter · 2020-04-30T18:50:35Z

Not being able to create visualizations in Kibana against fields in a flattened object is a real blocker to adopting the field type in our mappings. This flattened field type is exactly what we need, as a portion of our document is both dynamic and substantial in field count. But we have a use case that requires us to maintain search capabilities on that data.

Is there any chance this feature gets prioritized in the near future?

misabelcc · 2020-09-01T09:54:34Z

some news about to create visualizations in Kibana?

petersedivec · 2020-11-19T10:36:13Z

just chiming in, agree completely with @andrewkcarter mentioned above, really would like to adopt the flattened object for our use-case but without support in kibana it's tough for us to adopt, would greatly appreciate feedback from elastic on this

MrBones757 · 2021-05-07T06:53:13Z

Stumbled across this issue looking to create visualizations with the same field type.
After some messing around, we found that we were able to make use of this field type by using the filters sub aggregation. While not quite as handy as others (e.g Terms) it does seem to work quite well from our initial use, though we have no tested this in anger to see what its like with a larger dataset as we're still in dev for this usecase.

madisonb · 2021-11-02T19:51:26Z

Hi folks - I think I've got an update for this thread that gets us some progress here.

Elasticsearch now supports Runtime Fields, and Kibana supports the ability to add runtime fields to your index pattern.

The particular relevant part of the documentation is the support of automatically pulling from _source as noted in this section when you don't specify a runtime script. I'm testing on a 7.15.1 cluster right now and if I specify the exact path within my flattened field, kibana behaves as expected and is able to run aggregations and other queries on the field like normal.

Examples/Screenshots:

Lets say I have a flattened doc object within my index mapping, and I'm interested in understanding raw values from tweets from the twitter fire hose. I throw my entire tweet json into that root doc field.

Now I need to access a value to do some kind of aggregation, like the user's reported location. I add a new runtime mapping with the direct path within my flattened object, doc.user.location like so

Now I can run an aggregation to find my top location hits

Neat, but what about numeric values?

Set the runtime field to directly access the user's current follower count at doc.user.followers_count. (note the "Double" data type)

I do the same thing as prior for the user name field that I care about, doc.user.screen_name

Now show me the top users ranked by their number of followers:

Sweet!

Filters work too, here I've got some custom data that contains a value I care about. Same setup as prior for the doc.direction field:

From my testing so far, the performance is obviously an impact here but I'm not sure if that's the resources I've thrown at my cluster and data volume or the actual runtime field itself, but there does seem to be an impact. I have not tested the other runtime data types yet (like geo or ip), but keywords and numeric data seems to be working from my initial testing and I just wanted to share.

ceeeekay · 2022-10-13T23:56:06Z

@madisonb Amazing, thank you.

elasticmachine · 2022-11-28T21:22:12Z

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)

BenB196 · 2023-06-13T00:01:18Z

I wanted to poke this issue to see if there has been any discussion/plan for this. Many Elastic integrations (https://github.com/search?q=repo%3Aelastic%2Fintegrations+%22%7C+flattened+%7C%22&type=code&p=1) (~63 as of searching), have adopted the use of the flatten field type, many of these being useful fields you'd want to search/aggregate on. Not having the ability to do this natively in Kibana is extremely limiting.

While you can use something like runtime fields, this severely degrades performance¹ and doesn't provide the best user experience for people unfamiliar with runtime fields/painless.

Testing a simple terms count agg goes from ~250ms via native flatten field to ~8 seconds via runtime field! ↩

nicholas-r-king · 2023-11-09T04:07:27Z

We attempted to use runtime fields along with the flattened type and it caused severe degradation in performance (CPU util went from 10% -> 90% across all 9 nodes). We are also in need of some update on this ticket. (Enterprise ECK customer)

BenB196 · 2023-11-09T11:31:17Z

With the release of 8.11 and the addition of ES|QL, I wonder if this can be used as an alternative. I haven't actually tested this, and oddly, the limitations section doesn't mention if flattened if supported or not, but maybe it is?

kertal · 2023-11-09T13:40:06Z

@nicholas-r-king this sounds like a problem with Elasticsearch, or are these Kibana nodes?

kertal · 2023-11-09T14:12:05Z

With the release of 8.11 and the addition of ES|QL, I wonder if this can be used as an alternative. I haven't actually tested this, and oddly, the limitations section doesn't mention if flattened if supported or not, but maybe it is?

@BenB196 I guess it was forgotten to mention flattened fields in the list what's not supported, so currently, they are not supported in ES|QL

Bargs added enhancement New value added to drive a business result Team:Visualizations Visualization editors, elastic-charts and infrastructure labels Nov 16, 2018

Bargs mentioned this issue Nov 16, 2018

Flattened object fields design + implementation elastic/elasticsearch#33003

Closed

11 tasks

peterschretlen changed the title ~~Support New ES Object Field Type~~ Support Embedded JSON Field Type Jun 6, 2019

timroes changed the title ~~Support Embedded JSON Field Type~~ Support flattened field type from Elasticsearch Sep 19, 2019

timroes mentioned this issue Sep 19, 2019

Nested field support #1084

Open

timroes added Feature:New Field Type Add support for an Elasticsearch field type in Kibana Team:AppArch labels Mar 27, 2020

jimczi mentioned this issue Feb 2, 2021

Dotted field names that conflict with objects elastic/elasticsearch#63530

Closed

jsoriano mentioned this issue Feb 15, 2021

Add docker Integration elastic/integrations#632

Merged

2 tasks

axw mentioned this issue Feb 21, 2021

Make mapping explosion on tags visible elastic/apm-server#1292

Closed

ChrsMark mentioned this issue Sep 1, 2021

[Agent] Support labels dedot in k8s provider elastic/beats#27019

Closed

petrklapka added the NeededFor:Observability Issues the Observability team has dependencies on. label Oct 21, 2021

mattkime mentioned this issue Dec 21, 2021

[data views] ES field type support inventory #120289

Closed

legrego removed the EnableJiraSync label Aug 18, 2022

petrklapka added Feature:Data Views Data Views code and UI - index patterns before 8.0 Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. and removed Team:AppServicesSv labels Nov 28, 2022

rajvi-patel-22 mentioned this issue Aug 7, 2023

[Azure] Azure app state overview dashboard contains flattened data type elastic/integrations#7259

Closed

andrewkroh mentioned this issue Oct 11, 2023

Update elastic-package to use Package Spec 3.0.0-rc1 and fix v3 packages elastic/integrations#8115

Merged

4 tasks

jsoriano mentioned this issue Nov 29, 2023

Add support for subobjects: false elastic/package-spec#349

Closed

strawgate mentioned this issue Mar 5, 2024

Publish documents with domain-specific mappings legrego/homeassistant-elasticsearch#124

Closed

jsanz mentioned this issue Mar 8, 2024

KQL support for range queries on flattened field types #178264

Closed

BenB196 mentioned this issue Mar 12, 2024

ESQL should support the "flattened" field type elastic/elasticsearch#105637

Open

kertal mentioned this issue Apr 10, 2024

[Icebox] Field type support #180463

Open

andrewkroh mentioned this issue May 14, 2024

[okta.system] Utilize 'subobjects: false' for debugContext.debugData elastic/integrations#9863

Closed

BenB196 mentioned this issue Jul 1, 2024

[AI Assistant] Add Knowledge base for Painless Scripting #184285

Open

felixbarny mentioned this issue Aug 6, 2024

[Logs+] Enable JSON parsing for logs by default elastic/elasticsearch#96651

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support flattened field type from Elasticsearch #25820

Support flattened field type from Elasticsearch #25820

Bargs commented Nov 16, 2018 •

edited

Loading

elasticmachine commented Nov 16, 2018

jtibshirani commented Dec 3, 2018 •

edited

Loading

Bargs commented Dec 12, 2018

jtibshirani commented Mar 29, 2019 •

edited

Loading

Bargs commented Mar 29, 2019

jpountz commented Apr 1, 2019

timroes commented Apr 1, 2019

jpountz commented Apr 1, 2019

timroes commented Apr 2, 2019

jpountz commented Apr 2, 2019

timroes commented Apr 2, 2019

jtibshirani commented Apr 2, 2019 •

edited

Loading

peterschretlen commented Jun 6, 2019

elasticmachine commented Mar 27, 2020

andrewkcarter commented Apr 30, 2020

misabelcc commented Sep 1, 2020

petersedivec commented Nov 19, 2020

MrBones757 commented May 7, 2021

madisonb commented Nov 2, 2021

ceeeekay commented Oct 13, 2022

elasticmachine commented Nov 28, 2022

BenB196 commented Jun 13, 2023 •

edited

Loading

nicholas-r-king commented Nov 9, 2023

BenB196 commented Nov 9, 2023

kertal commented Nov 9, 2023

kertal commented Nov 9, 2023

Support flattened field type from Elasticsearch #25820

Support flattened field type from Elasticsearch #25820

Comments

Bargs commented Nov 16, 2018 • edited Loading

elasticmachine commented Nov 16, 2018

jtibshirani commented Dec 3, 2018 • edited Loading

Bargs commented Dec 12, 2018

jtibshirani commented Mar 29, 2019 • edited Loading

Bargs commented Mar 29, 2019

jpountz commented Apr 1, 2019

timroes commented Apr 1, 2019

jpountz commented Apr 1, 2019

timroes commented Apr 2, 2019

jpountz commented Apr 2, 2019

timroes commented Apr 2, 2019

jtibshirani commented Apr 2, 2019 • edited Loading

peterschretlen commented Jun 6, 2019

elasticmachine commented Mar 27, 2020

andrewkcarter commented Apr 30, 2020

misabelcc commented Sep 1, 2020

petersedivec commented Nov 19, 2020

MrBones757 commented May 7, 2021

madisonb commented Nov 2, 2021

ceeeekay commented Oct 13, 2022

elasticmachine commented Nov 28, 2022

BenB196 commented Jun 13, 2023 • edited Loading

Footnotes

nicholas-r-king commented Nov 9, 2023

BenB196 commented Nov 9, 2023

kertal commented Nov 9, 2023

kertal commented Nov 9, 2023

Bargs commented Nov 16, 2018 •

edited

Loading

jtibshirani commented Dec 3, 2018 •

edited

Loading

jtibshirani commented Mar 29, 2019 •

edited

Loading

jtibshirani commented Apr 2, 2019 •

edited

Loading

BenB196 commented Jun 13, 2023 •

edited

Loading