Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support flattened field type from Elasticsearch #25820

Open
Tracked by #180463
Bargs opened this issue Nov 16, 2018 · 28 comments
Open
Tracked by #180463

Support flattened field type from Elasticsearch #25820

Bargs opened this issue Nov 16, 2018 · 28 comments
Labels
enhancement New value added to drive a business result Feature:Data Views Data Views code and UI - index patterns before 8.0 Feature:New Field Type Add support for an Elasticsearch field type in Kibana impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. loe:small Small Level of Effort NeededFor:Observability Issues the Observability team has dependencies on. Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@Bargs
Copy link
Contributor

Bargs commented Nov 16, 2018

A new object field type is coming to ES. I played around with the feature branch a bit today and collected some thoughts and findings. From what I've seen so far, there are some small updates to Kibana we'll definitely want to make and some things we should discuss.

  • Need to add JSON type to kibana (or whatever name the ES lands on for this field type). Currently it shows up in the index pattern as unknown.
  • Autocomplete on values doesn't work because the field is not aggregatable
  • Autocomplete on field names doesn’t work, because we don't know what sub-fields the object has
  • In KQL we implement wildcard field names ourselves based on the fields in the index pattern, so something like head*:application/json actually works but headers.con*:application/json does not because we don’t know about those fields. This could be pretty confusing to users
  • Filters can’t be created from the doc table (similar to our treatment of arrays of objects today)
  • Filters can't be created from the "Add Filter" UI in the filter bar (this is worse than no autocomplete in the query bar because we don’t allow free text input for the field name. Currently the field name must be in the index pattern)
  • Highlighting would be nice, however we don’t currently highlight values inside arrays of objects, so a lack highlighting in a JSON document won’t be totally surprising to current users.

I only spent about an hour with it so there may be more things I'm missing, would definitely be good to get more eyes on it.

Feature branch here if anyone else wants to check it out: https://github.com/elastic/elasticsearch/tree/object-fields

@Bargs Bargs added enhancement New value added to drive a business result Team:Visualizations Visualization editors, elastic-charts and infrastructure labels Nov 16, 2018
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app

@jtibshirani
Copy link

jtibshirani commented Dec 3, 2018

Thanks @Bargs for taking a look at the branch! I had a couple thoughts/ questions.

First, from talking to the Beats team, I think it would be valuable to add support for terms aggregations, both on the root field (like headers) and the keyed values (headers.content-type). This would also allow Kibana to support autocompletion of values when adding a filter. I am not completely sure this is possible to do in a performant way, but is something I am looking into.

Next, we had discussed if there was a way to expose the list of subfields/ keys that are available. I don't think it makes sense to return this as part of the mappings or field capabilities, because there may be a huge number distinct subfields (and the number of field mappings is assumed to be bounded to a reasonable number). A more sensible approach might be to index the subfield names into a separate lucene field, and allow for a terms aggregation that returns the most popular subfields. However, this doesn't fit perfectly with the current API around json fields, and would require storing additional information. The alternative would be to accept that we don't support autocomplete on these subfields, and when filtering allow for free text input on the field name. I'm curious as to your thoughts here.

@Bargs
Copy link
Contributor Author

Bargs commented Dec 12, 2018

Sorry for the delay in response, I was out all last week.

From a technical standpoint I think storing the subfield names in an index and doing a terms agg on them would work for autocompleting the field names in Kibana. But while chatting with @lukasolson I realized even with the field names we would still be missing type information and as a result would not be able to intelligently suggest query types. If we go down the path of trying to make these feel like regular searchable fields for average users I think we need to go all the way, so we would need that type information too. I have a feeling that probably complicates things even more. If so, we might be able to do without autocomplete on json fields for the time being. The most important thing to me is that the autocomplete doesn't appear broken or unpredictable to a normal user, but I think we might be able to solve that be adding some warnings in the UI if the user is searching on a JSON field.

@jtibshirani
Copy link

jtibshirani commented Mar 29, 2019

I've now started to pick up work JSON fields again and have a couple updates. First, we worked out how to support for keyword-style aggregations like terms, so it should be possible for Kibana to autocomplete on the field values.

Second, after thinking about it more, I think it could be valuable to provide access to the possible keys in the JSON field. Even beyond Kibana, this seems generally useful as part of a search workflow on these fields: a client could first retrieve the common keys in the JSON field, present them to the user, then allow for searches on these keys. Otherwise these keys must be known in advance, or can only be discovered by encountering them within documents returned from another search.

To support this, we could index the keys into a separate field, and the common keys would then be retrieved through a terms aggregation. The API could look something like this:

  • json_field searches only field values
  • json_field.some_key searches only values belonging to the key some_key
  • json_field._keys searches only field keys (new option, not implemented currently)

@Bargs @jpountz @jimczi I was curious about your thoughts on the above. The downsides are that this API isn't as elegant, and it may involve indexing more information. Note that this relates to @jimczi's question here about whether keyed JSON fields should use the _field_names field: elastic/elasticsearch#40069 (comment)

@Bargs
Copy link
Contributor Author

Bargs commented Mar 29, 2019

Sounds good to me! We'd love to have a way to retrieve a list of the keys, whatever form the API takes.

@jpountz
Copy link

jpountz commented Apr 1, 2019

How would this work in practice? Is my understanding correct that Kibana would first retrieve fields from the field capabilities APIs, and as a second step for each json field it would run a terms aggregation on the json_field._keys field to further populate the list of fields that can be searched or aggregated (probably with a reasonable value of terminate_after to keep performance ok)?

At first sight this sounds like a good idea to me, but I'd like to double check that we are ok with the complexity that it introduces in Kibana as well as the trade-off: because there is no upper bound on the number of fields and than making sure to collect all fields would be too slow anyway, most of the time we would only collect a subset of the sub fields that exist in a json object.

@timroes
Copy link
Contributor

timroes commented Apr 1, 2019

I would like to put a spot on something @Bargs mentioned earlier already. Just knowing the field names (via _keys) doesn't help us that much (or just in a couple of places). It would still not allow us to build e.g. any visualization on those fields, since we need to have the type information for those fields available to work with them. Without them having proper types we cannot add them properly into the index pattern and thus maybe just make smaller workaround solution for searches, but not really support them across Kibana. From a technical point, how are those types actually be treated inside Elasticsearch? Are all values just keyword types (in which case we could hardcode that too)?

@jpountz
Copy link

jpountz commented Apr 1, 2019

@timroes Yes they would behave almost exactly like a keyword field.

@timroes
Copy link
Contributor

timroes commented Apr 2, 2019

@jpountz I am a bit worried about the "almost" in that sentence :D Could you tell what are the actual differences? Because we need to know if we are able to simply treat them as "keyword" fields (but that would then apply to all places), or if we can't, in which case we would need to know about that type difference somehow.

@jpountz
Copy link

jpountz commented Apr 2, 2019

@timroes Here are the differences to my knowledge (please review @jtibshirani):

  • produced scores will be different because field statistics such as the document count for a field would be different,
  • min_doc_count=0 on terms aggregations is unsupported (we are hoping to address it in the near future).

Other than that, they should support the same set of queries and aggregations.

@timroes
Copy link
Contributor

timroes commented Apr 2, 2019

Okay that sounds fine to me. We don't mind too much about the score (especially not about specific values) and we're not using min_doc_count=0 on terms aggregations as far as I am aware (it anyway sounds like a weird parameter value to me).

So the plans here sound rather reasonable. I would just suggest that while creating that index pattern, we're giving the user a flag if there are any JSON fields contained, whether or not they want "to use those in Kibana", since it sounds to me, like we could potentially otherwise bloat the index pattern quiet much, and maybe users don't want to use them actually.

@jtibshirani
Copy link

jtibshirani commented Apr 2, 2019

@timroes In addition to the aggregation limitation that @jpountz mentioned (which we hope to address), I tried to list the restrictions here: https://github.com/elastic/elasticsearch/blob/object-fields/docs/reference/mapping/types/embedded-json.asciidoc#supported-operations. You'll notice that certain query types like regexp are not supported.

How would this work in practice? ... I'd like to double check that we are ok with the complexity that it introduces in Kibana as well as the trade-off.

I'm also hoping to understand this better, would it be possible to walk through how Kibana would load + display the keys, given the current proposal of running a terms aggregation on json_field._keys?

@peterschretlen peterschretlen changed the title Support New ES Object Field Type Support Embedded JSON Field Type Jun 6, 2019
@peterschretlen
Copy link
Contributor

If I understand the purpose of embedded_json, it lets us avoid really large mappings and field lists, when typically only a handful of fields are of interest (and some fields may be very sparsely populated).

Do we expect (can we assume) these fields are known ahead of time? Or do they need to be discovered via autocomplete (which may not even be possible to do well if the number of keys is large).

Say we expect only a handful of embedded fields are of interest, and that handful doesn't change much - these are two big assumptions - then how about defining the embedded json fields as part of the index pattern (similar to how we do now for scripted fields)? It would not be as nice to work with as automatically discovered fields via autocomplete, but then the field type can be defined and there can be as many or as few as you want.

@timroes timroes changed the title Support Embedded JSON Field Type Support flattened field type from Elasticsearch Sep 19, 2019
@timroes timroes added Feature:New Field Type Add support for an Elasticsearch field type in Kibana Team:AppArch labels Mar 27, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-arch (Team:AppArch)

@andrewkcarter
Copy link

Not being able to create visualizations in Kibana against fields in a flattened object is a real blocker to adopting the field type in our mappings. This flattened field type is exactly what we need, as a portion of our document is both dynamic and substantial in field count. But we have a use case that requires us to maintain search capabilities on that data.

Is there any chance this feature gets prioritized in the near future?

@misabelcc
Copy link

some news about to create visualizations in Kibana?

@petersedivec
Copy link

just chiming in, agree completely with @andrewkcarter mentioned above, really would like to adopt the flattened object for our use-case but without support in kibana it's tough for us to adopt, would greatly appreciate feedback from elastic on this

@MrBones757
Copy link

Stumbled across this issue looking to create visualizations with the same field type.
After some messing around, we found that we were able to make use of this field type by using the filters sub aggregation. While not quite as handy as others (e.g Terms) it does seem to work quite well from our initial use, though we have no tested this in anger to see what its like with a larger dataset as we're still in dev for this usecase.

@petrklapka petrklapka added the NeededFor:Observability Issues the Observability team has dependencies on. label Oct 21, 2021
@madisonb
Copy link

madisonb commented Nov 2, 2021

Hi folks - I think I've got an update for this thread that gets us some progress here.

Elasticsearch now supports Runtime Fields, and Kibana supports the ability to add runtime fields to your index pattern.

The particular relevant part of the documentation is the support of automatically pulling from _source as noted in this section when you don't specify a runtime script. I'm testing on a 7.15.1 cluster right now and if I specify the exact path within my flattened field, kibana behaves as expected and is able to run aggregations and other queries on the field like normal.

Examples/Screenshots:

Lets say I have a flattened doc object within my index mapping, and I'm interested in understanding raw values from tweets from the twitter fire hose. I throw my entire tweet json into that root doc field.

Screen Shot 2021-11-02 at 3 26 05 PM

Now I need to access a value to do some kind of aggregation, like the user's reported location. I add a new runtime mapping with the direct path within my flattened object, doc.user.location like so

Screen Shot 2021-11-02 at 3 30 35 PM

Now I can run an aggregation to find my top location hits

image

Neat, but what about numeric values?

Set the runtime field to directly access the user's current follower count at doc.user.followers_count. (note the "Double" data type)
image

I do the same thing as prior for the user name field that I care about, doc.user.screen_name

Now show me the top users ranked by their number of followers:

image

Sweet!

Filters work too, here I've got some custom data that contains a value I care about. Same setup as prior for the doc.direction field:

image

From my testing so far, the performance is obviously an impact here but I'm not sure if that's the resources I've thrown at my cluster and data volume or the actual runtime field itself, but there does seem to be an impact. I have not tested the other runtime data types yet (like geo or ip), but keywords and numeric data seems to be working from my initial testing and I just wanted to share.

@exalate-issue-sync exalate-issue-sync bot added impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. loe:small Small Level of Effort and removed loe:medium Medium Level of Effort impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. labels Nov 8, 2021
@ceeeekay
Copy link

@madisonb Amazing, thank you.

@petrklapka petrklapka added Feature:Data Views Data Views code and UI - index patterns before 8.0 Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. and removed Team:AppServicesSv labels Nov 28, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)

@BenB196
Copy link

BenB196 commented Jun 13, 2023

I wanted to poke this issue to see if there has been any discussion/plan for this. Many Elastic integrations (https://github.com/search?q=repo%3Aelastic%2Fintegrations+%22%7C+flattened+%7C%22&type=code&p=1) (~63 as of searching), have adopted the use of the flatten field type, many of these being useful fields you'd want to search/aggregate on. Not having the ability to do this natively in Kibana is extremely limiting.

While you can use something like runtime fields, this severely degrades performance1 and doesn't provide the best user experience for people unfamiliar with runtime fields/painless.

Footnotes

  1. Testing a simple terms count agg goes from ~250ms via native flatten field to ~8 seconds via runtime field!

@nicholas-r-king
Copy link

We attempted to use runtime fields along with the flattened type and it caused severe degradation in performance (CPU util went from 10% -> 90% across all 9 nodes). We are also in need of some update on this ticket. (Enterprise ECK customer)

@BenB196
Copy link

BenB196 commented Nov 9, 2023

With the release of 8.11 and the addition of ES|QL, I wonder if this can be used as an alternative. I haven't actually tested this, and oddly, the limitations section doesn't mention if flattened if supported or not, but maybe it is?

@kertal
Copy link
Member

kertal commented Nov 9, 2023

@nicholas-r-king this sounds like a problem with Elasticsearch, or are these Kibana nodes?

@kertal
Copy link
Member

kertal commented Nov 9, 2023

With the release of 8.11 and the addition of ES|QL, I wonder if this can be used as an alternative. I haven't actually tested this, and oddly, the limitations section doesn't mention if flattened if supported or not, but maybe it is?

@BenB196 I guess it was forgotten to mention flattened fields in the list what's not supported, so currently, they are not supported in ES|QL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:Data Views Data Views code and UI - index patterns before 8.0 Feature:New Field Type Add support for an Elasticsearch field type in Kibana impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. loe:small Small Level of Effort NeededFor:Observability Issues the Observability team has dependencies on. Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

No branches or pull requests