Apply context filters when retrieving list of fields in application #95558

jimczi · 2021-03-26T17:44:43Z

In elastic/elasticsearch#56195 we added the ability to filter the field capabilities output with an index filter.
The idea is that _field_caps can dynamically retrieve a list of fields that are relevant for the context.
If a user in Discover has a range filtering the last 3 days, we should restrict the list of available fields for suggestions to the ones that appear in the range.
Our Observability solution also uses constant_keyword to differentiate data streams so applying the current filters to the _field_caps call could limit the fields to a specific metricset like cpu for instance.

This change is important for our Solutions in order to have a way to limit the number of fields in large index patterns when the context is narrowed (by a range filter or a filter on the dataset).
Today the list of fields of an Index Pattern is retrieved once on each app through field_caps without taking the context into account.
We should make it even more dynamic and apply the context of the apps more eagerly when possible (changing the range filter should update the list of available fields).

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-03-26T18:18:40Z

Pinging @elastic/kibana-app-services (Team:AppServices)

mattkime · 2021-03-26T22:39:19Z

I think the next step for this would be to design a specific UX that would benefit from this enhancement. Which solution?

@ruflin You spoke up on the ES issue, I'm curious if you have any ideas.

ruflin · 2021-03-29T07:55:13Z

++ on what @jimczi described. On my end I had initially mainly Dashboards in mind that a dashboard could define its prefilters for fieldcaps and looking at a mysql dashboard, only the related mysql fields are suggested.

On the solution side a few ideas:

APM: APM has a specific set of indices. In the future these are across logs, metrics, traces but can be prefiltered to only show the releveant ones. @sqren
Endpoint UI: Same as API, which indices to be used is known in advance. @kevinlog
Metrics UI: I expect his is also a predefined subset to filter on. @jasonrhodes

sorenlouv · 2021-03-30T09:21:31Z

If a user in Discover has a range filtering the last 3 days, we should restrict the list of available fields for suggestions to the ones that appear in the range.

Yes! I'd love that. We could really need that in Observability for the search bar that currently suggests many irrelevant fields. I think that would also mostly solve #94879
cc @andrewvc

kevinlog · 2021-03-30T11:44:01Z

cc @peluja1012 @XavierM - this could be useful for future iterations of the global search bar in the Security app. We could restrict some field suggestions.

@sqren

Yes! I'd love that. We could really need that in Observability for the search bar that currently suggests many irrelevant fields. I think that would also mostly solve #94879

This would also be helpful for us in the Security UI, however I think we'd still like to "map" the suggestions to a more human readable format instead of always showing the fields as the appear in the document. What @andrewvc mentions here: #94879 (comment) is still relevant and should also be solved.

mattkime · 2021-03-31T16:54:53Z

Could a solutions team drive the planning for this effort? This needs a thorough UX story - How does the user set the filter? (either directly or indirectly via an existing control) Do these filters persist between Kibana apps? Perhaps there are other questions we should be asking but those are the first that come to mind.

From there I'd need to figure out what is needed from the index patterns api.

andrewvc · 2021-03-31T18:26:59Z

++ to @kevinlog , it doesn't hit the usability threshold we need.

While I think this proposal is useful it doesn't solve #94879 , though it might be a step in the right direction. What that issue calls for is a human curated list of fields that are useful. If Elasticsearch mappings had the ability to tag fields as 'friendly' or 'high priority', that would be more helpful, though I'm not convinced that's the right approach. That would look something like what is below:

{
  "monitor.name": {
    type: "text",
    friendly_name: "Monitor Name",
    description: "The monitor's human readable name"
    priority: true // definitely needs a better name, 'suggested?'
  }
}

The point is we just want to show the most frequently used fields. In any given solution there are lots of internal fields that are used for various purposes but rarely queried by users. We should not suggest these.

The real question is where do we do this work. Do we do it in ES? In Kibana? If Kibana, do we do it in an Index Pattern? Something else on top of one?

My $0.02 is that it's better to do it in Kibana and bundle a field list with each solution or package. I'd just make this a feature of the Kuery Bar UI component.

I'm on vacation starting next week lasting through mid-month, but I'd be glad to move this forward when I return, though I won't have time before this.

mattkime · 2021-04-01T03:28:02Z

@andrewvc

The point is we just want to show the most frequently used fields.

Index patterns already has this via the field 'count' property. Discover is the only app that makes use of it but it certainly could be expanded. If a particular solution creates index patterns, count values for the various fields could be preset. In the UI, this value is labeled 'popularity'

andrewvc · 2021-04-02T21:16:10Z

@mattkime I realize the word 'frequently' did not convey my intent accurately. I meant it to mean the most useful fields, not the ones that were literally the most accessed.

Let's think about users who don't know every field in the Uptime or APM schemas. They don't want a field named monitor.duration.us, and then to query by microseconds, they just want "Monitor Duration", in seconds or ms. They don't even want to see most of the fields we have, many of which are esoteric, or best for internal use.

This list needs to be manually curated, not algorithmically generated. This whole problem is analogous to the situation between the new Exploratory View and Lens. Lens is a power tool with full access to the schema, and assumes the user is comfortable with, or willing to learn the under the hood schema. The reason the exploratory view is so powerful is it presents users with commonly used fields exclusively, and doesn't use complex schema dot notation, rather favoring friendly names, and only showing a small number of the most important fields.

andrewvc · 2021-04-02T21:17:24Z

That said, this specific ES feature is probably still a nice win for the Kuery bar in the discover app, and may be part of a larger approach for solutions. (IMHO)

jasonrhodes · 2021-04-02T21:47:06Z

I'm not sure the manually-curated solution @andrewvc is describing would ever work for us in the Metrics UI (and I'm not sure about the Logs UI, either), largely because so much of what a user will be interacting with will be dynamic on some level. So even if we are able to tell in advance which are the most useful fields a user may want to query on, knowing if those fields are even present in the data (and in the time range being queried) would be incredibly useful to us. @simianhacker has taken several stabs at trying to do this for us in the Metrics UI to no avail.

It would be fantastic if a field's definition somehow included the human readable details alongside its mapping (this feels like we're veering back into all of the field customization from Kibana Index Patterns?) so that when we detect the existence of monitor.duration.us in the queried data, the field is presented to the user as Monitor Duration (just as an example using words already mentioned here so far).

ruflin · 2021-04-06T09:20:24Z

Elasticsearch supports metadata for each field mapping. It would be possible to add this description there. I remember when we introduced this @jpountz mentioned to not encourage "random" data there as it might explode the template size. But maybe something we should look into.

Lets assume for a moment it is in the template. Kibana could directly read it from there. It would also allow that for certain fields like monitor.duration.us which have different descriptions in different context to have it specified multiple times. So metrics-uptime-* template has one description, metrics-nginx.stub_status-* has a different description that fits better the context. Kibana would show the correct one in the correct context based on which indices are crawled.

jpountz · 2021-04-06T17:01:35Z

Cluster state storage isn't free so we are careful with how we use it. We have plans on the roadmap to deduplicate mappings on data streams, so if you plan on moving forward with things like that, please let us know ahead of time so that we can prioritize accordingly.

I'd need to think more about whether field descriptions would be a good fit for metadata. On the one hand it feels ok because a field's description is metadata about a field, but on the other hand our current features wouldn't allow setting field descriptions on dynamically mapped fields. I'm also unsure how we'd handle i18n, should we store the field's description in all supported languages in the metadata? This would make mappings very hard to read. Filtering the list of fields so that it only shows those that are relevant to the data feels very useful, let's move the discussion about field descriptions to a separate issue?

Regarding relevancy, one idea that has been floating around would be the ability to store telemetry about a cluster's usage within a cluster for our users' purposes. For instance storing information about field access could help provide users with better field suggestions, but we could also leverage this information to make recommendations about which fields it would make sense to move to runtime fields in order to save space, and I'm sure we'll find other use-cases as we think more about it.

ruflin · 2021-04-07T08:02:36Z

The description of a field could become quite extensive in some scenarios. So pushing all this to the cluster state does not seem like the ideal place. I'm wondering if there are other options where we could store "meta" information about a field which does not have to end up in the cluster state. Usually this information is not required during query time but is retrieved one by one when a user wants to get more information about the field. Specifying it in the template itself would still be convenient ...

jasonrhodes · 2021-04-08T14:05:23Z

We're really edging back towards storing this info in the Data Views née (Kibana) Index Patterns, eh? If we can solve the sync/caching issue, maybe it's not the worst idea?

ruflin · 2021-04-08T19:52:52Z

I think there is an issue here with storing it in index patterns. Take my example above with metrics-uptime-* and metrics-nginx.stub_status-*. This would mean, we need an index pattern for each of these. From a technical point of view, this would be ideal but a user would now see on the left side an index pattern drop down with a LOT of entries. This is why we created metrics-* index pattern but it causes issues for us as index patterns are not really aligned with our indexing strategy. Maybe that is the more fundamental issue to be solved.

ruflin · 2021-04-21T19:16:34Z

This issue derailed a bit into how to store additional information about fields and other discussions. What @jimczi described initial is much simpler and is about exposing all the benefits we get from the data stream naming scheme in Kibana for field caps. The benefits for the users are that they only see relevant fields (older related issue: #24709) and much better performance. Elasticsearch already has all the required features, Kibana should adopt it.

jasonrhodes · 2021-05-04T03:05:19Z

@ruflin yeah, that sounds good. Is there a simple example of how to implement this from the Kibana side? I'd love to test this out in Metrics/Logs...

jimczi · 2021-05-04T12:08:23Z

@jasonrhodes , you'd need to call the field capabilities API augmented with the active filters:

GET metrics-*/_field_caps?fields=*
{
  "index_filter": {
    "term": {
        "data_stream.dataset": "system.cpu"
     }
  }
}

Currently Kibana calls this API without any context (no index_filter) even if filters are defined.

mattkime · 2021-05-13T23:14:11Z

@jimczi I'm digging in to make sure I have a complete understanding of the basics for this effort.

For the most part, the idea of applying filter criteria from the search bar to field lists is pretty straight forward. However, you mention constant_keyword usage and I'm thinking about how this might be exposed to the user. Unless we plan to educate our users about how to use this field, we need to provide a user interface for selecting it. Do you have any thoughts on how we expose this field and its values?

jpountz · 2021-05-14T12:45:32Z

@mattkime I don't think that the goal is to expose constant keywords to users. I believe that constant keywords were mentioned because they work especially well with this index_filter parameter, but to me the goal is to make Kibana pass all the filters that it knows about to Elasticsearch and then Elasticsearch will figure out whether it can leverage these filters to narrow down field suggestions. For instance if a user already filtered based on http.request.method: GET and the active time range is on the past week, Kibana could send the following filter to Elasticsearch's _field_caps:

GET logs-*/_field_caps?fields=*
{
  "index_filter": {
    "bool": {
      "filter": [
        { "term": { "http.request.method": "GET" } },
        { "range": { "@timestamp": { "gte": "now-7d" } } }
      ]
    }
  }
}

In that specific case, Elasticsearch will automatically ignore old indices as well as indices that don't have a http.request.method field, which will likely yield a much smaller list of field suggestions that if no filter had been provided.

ppisljar · 2021-05-26T13:44:59Z

reading all above this is what i propose:

we either extend the getFields method of index pattern to take in optional Filter[] or add a new getFilteredFields method. This method will include provided filters to its query to the field caps API. Index pattern (internal) field list will not be affected by this and will always contain all the fields.
applications that want to consume this are free to do so, for example visualize could set all the filters its sending to a visualization to this method and make sure the field lists it exposes only contain relevant fields.

i have one question still: aren't we putting a lot of effort to work around the original problem we have? we are misusing index patterns in metrics UI and some other solutions to store too many fields. we should rather have multiple index patterns for each of the data type they are storing.

jasonrhodes · 2021-05-26T18:38:29Z

@ppisljar are you referring to the metricbeat-* index mapping? It's unfortunately rather set in stone, but data streams are the solution to that. We're quite some ways away from when all customers will be storing all data in more segmented data streams, though, so until then this will help a lot.

ppisljar · 2021-09-15T09:21:02Z

yes i am referring to the metricbeat-* index mapping. It seems you use multiple dataviews in elasticsearch with dense fields and well defined scope, but you fail to transfer that in kibana with a single mapping that just matches everything.

Also i agree with @mattkime , this needs a good user story and UX. I just can't imagine how exactly do we make use of this ? (for example you are looking at a dashboard, you want to add a filter to it. you expect field list to contain less fields. but which ones really ? (your dashboard contains visualizations from dozen different data views, could have fixed (alternative) time ranges defined for specific panels etc etc. So i just can't imagine what exactly should we filter on in such cases.

sorenlouv · 2021-09-15T10:09:16Z

Also i agree with @mattkime , this needs a good user story and UX.
@ppisljar

I can contribute with a use case from APM.

APM transactions are stored in indicies like apm-8.0.0-transactions-* (for data streams this will be traces-apm* but won't change anything).

A user can have (micro)services in different programming languages. Transactions from these will be ingested to the same indicies (aka not an index per service), so services will share the same mapping.

This causes field suggestions to "bleed" over to services where they are not relevant. For instance, jvm.* fields are only relevant for java services but are also displayed for the ruby service:

In this case, selecting a jvm.* field in the ruby service will always return 0 results,

For field suggestions we don't have this problem because we filter the suggestions with a terms agg to only get relevant values. In the following (overly) simple example we only expect to see service.name suggestions matching the currently selected service (opbeans-ruby) which is exactly what we see:

Similarly on the service overview we expect to see suggestions for all services:

It would be AWESOME If we could have the same filtering mechanism for field suggestions like we have for value suggestions, so only relevant fields show up for the selected service.

jimczi · 2021-09-15T10:50:16Z

A user can have (micro)services in different programming languages. Transactions from these will be ingested to the same indicies (aka not an index per service), so services will share the same mapping.

Why is this set in stone ? Why not using one index per language ? If the mapping of these languages is distinct it would be beneficial to separate them in their own indices.

It would be AWESOME If we could have the same filtering mechanism for field suggestions like we have for value suggestions, so only relevant fields show up for the selected service.

Implementing value suggestions through an aggregation is a bad practice on time-based indices. It is not precise for a big cost in terms of performance and latency. Value and by extension field suggestion needs to be fast so we cannot expect the same flexibility than we have at the query DSL. The new terms_enum API was added to replace this bad habit of using a plain aggregation to get suggestions. We need a better mechanism and that needs to start with the design. If you have a single data stream with different services that don't share the same fields, the recommendation is to split into multiple indices.
We don't want to rely on slow features that only shine when tested with 10 documents in a demo.

sorenlouv · 2021-09-15T13:33:51Z

Why is this set in stone ? Why not using one index per language ?

We did initially consider having a datastream per service but since each service already have 4 data streams (logs, metrics, errors, traces) this would result in 100s maybe even 1000s of datastreams for customers with many services.
Therefore the ES team suggested that we didn't split data streams per service, and instead stuck with 4 data streams in total.

jpountz · 2021-09-15T14:00:05Z

I was involved in these discussions and my recollection is that there was a bit more nuance. We did indeed advise against using different data streams per service to avoid index/shard explosion, however we still think that we should split data streams that would have different mappings. So ideally services should be grouped together depending on whether they would have very similar mappings or not. (There is still some nuance there, e.g. if 1,000 fields are common and only 2 fields differ, maybe it's still a better trade-off to put data into the same data stream to keep the number of data streams under control. The Elasticsearch team is happy to be consulted when cases like that arise.)

In an ideal world, APM would be able to have different granularities for each type of data that it records, e.g. maybe there could be a single data stream for internal metrics since all services have the same mappings for their internal metrics while there would be multiple data streams for traces as we would group services that have the same mappings for traces together. (I can certainly appreciate how it makes the architecture more complex.)

sorenlouv · 2021-09-15T15:16:34Z

however we still think that we should split data streams that would have different mappings

Okay, this sounds interesting. Something that we could consider is to split data streams by APM agent (so separate data streams for python, node, dotnet etc).
I'll take this back to the team and see if this is something that is still possible - not sure whether it is considered a breaking change at this point.

axw · 2021-09-16T01:35:17Z

In an ideal world, APM would be able to have different granularities for each type of data that it records, e.g. maybe there could be a single data stream for internal metrics since all services have the same mappings for their internal metrics while there would be multiple data streams for traces as we would group services that have the same mappings for traces together. (I can certainly appreciate how it makes the architecture more complex.)

That's what we've ended up doing. We produce multiple data streams, most of which are not service-specific. There is one service-specific data stream which contains custom application metrics and runtime/language-specific metrics. We should be able to add a _field_caps filter for that data stream. @sqren I'll point you at more details offline

jpountz · 2021-09-16T08:38:45Z

Wonderful!

ppisljar · 2021-09-20T11:34:43Z

We should be able to add a _field_caps filter for that data stream. @sqren I'll point you at more details offline

Why is field_caps filter still needed ? If you will split up your data streams so they dont have 1000s of fields, is this still an issue ?

axw · 2021-09-21T01:33:41Z

@ppisljar taking the specific example that @sqren shared above:

For one of the data streams, there are still service-specific fields like jvm.*. We shouldn't show those as field suggestions when in the context of the "opbeans-ruby" service, since it is not running in a JVM and will never produce those metric fields.

ppisljar · 2022-04-12T08:10:23Z

resolved by #121367

jimczi changed the title ~~Apply filters and range to field list and suggestions~~ Apply filters and range when retrieving list of fields in application Mar 26, 2021

jimczi changed the title ~~Apply filters and range when retrieving list of fields in application~~ Apply context filters when retrieving list of fields in application Mar 26, 2021

wylieconlon added enhancement New value added to drive a business result Feature:Data Views Data Views code and UI - index patterns before 8.0 Team:AppServices labels Mar 26, 2021

sorenlouv mentioned this issue Mar 30, 2021

[Observability] Query bar should, by default, only show relevant fields #94879

Open

mattkime mentioned this issue Apr 6, 2021

[Meta] Index Patterns Roadmap #78376

Closed

63 tasks

jasonrhodes mentioned this issue May 4, 2021

[Metrics UI] Use field caps dataset filtering when possible #99234

Closed

walterra mentioned this issue May 27, 2021

Support to get just populated fields from a Kibana index pattern #100779

Closed

flash1293 mentioned this issue Jun 7, 2021

[Lens] Add telemetry for field list available/emptiness #101467

Closed

exalate-issue-sync bot added impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort labels Jun 21, 2021

flash1293 mentioned this issue Sep 22, 2021

[Lens] Switch field list filtering to filtered field caps instead of sampling #112782

Closed

exalate-issue-sync bot added loe:medium Medium Level of Effort and removed loe:small Small Level of Effort labels Oct 5, 2021

flash1293 added the NeededFor:VisEditors label Oct 19, 2021

exalate-issue-sync bot added impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. and removed impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. labels Nov 2, 2021

exalate-issue-sync bot added loe:small Small Level of Effort and removed loe:medium Medium Level of Effort labels Nov 19, 2021

ppisljar closed this as completed Apr 12, 2022

exalate-issue-sync bot reopened this Apr 12, 2022

exalate-issue-sync bot closed this as completed Apr 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply context filters when retrieving list of fields in application #95558

Apply context filters when retrieving list of fields in application #95558

jimczi commented Mar 26, 2021 •

edited

Loading

elasticmachine commented Mar 26, 2021

mattkime commented Mar 26, 2021

ruflin commented Mar 29, 2021

sorenlouv commented Mar 30, 2021

kevinlog commented Mar 30, 2021

mattkime commented Mar 31, 2021

andrewvc commented Mar 31, 2021

mattkime commented Apr 1, 2021

andrewvc commented Apr 2, 2021

andrewvc commented Apr 2, 2021

jasonrhodes commented Apr 2, 2021

ruflin commented Apr 6, 2021

jpountz commented Apr 6, 2021

ruflin commented Apr 7, 2021

jasonrhodes commented Apr 8, 2021

ruflin commented Apr 8, 2021

ruflin commented Apr 21, 2021

jasonrhodes commented May 4, 2021

jimczi commented May 4, 2021 •

edited

Loading

mattkime commented May 13, 2021

jpountz commented May 14, 2021 •

edited

Loading

ppisljar commented May 26, 2021

jasonrhodes commented May 26, 2021

ppisljar commented Sep 15, 2021

sorenlouv commented Sep 15, 2021 •

edited

Loading

jimczi commented Sep 15, 2021

sorenlouv commented Sep 15, 2021

jpountz commented Sep 15, 2021

sorenlouv commented Sep 15, 2021

axw commented Sep 16, 2021

jpountz commented Sep 16, 2021

ppisljar commented Sep 20, 2021

axw commented Sep 21, 2021

ppisljar commented Apr 12, 2022

Apply context filters when retrieving list of fields in application #95558

Apply context filters when retrieving list of fields in application #95558

Comments

jimczi commented Mar 26, 2021 • edited Loading

elasticmachine commented Mar 26, 2021

mattkime commented Mar 26, 2021

ruflin commented Mar 29, 2021

sorenlouv commented Mar 30, 2021

kevinlog commented Mar 30, 2021

mattkime commented Mar 31, 2021

andrewvc commented Mar 31, 2021

mattkime commented Apr 1, 2021

andrewvc commented Apr 2, 2021

andrewvc commented Apr 2, 2021

jasonrhodes commented Apr 2, 2021

ruflin commented Apr 6, 2021

jpountz commented Apr 6, 2021

ruflin commented Apr 7, 2021

jasonrhodes commented Apr 8, 2021

ruflin commented Apr 8, 2021

ruflin commented Apr 21, 2021

jasonrhodes commented May 4, 2021

jimczi commented May 4, 2021 • edited Loading

mattkime commented May 13, 2021

jpountz commented May 14, 2021 • edited Loading

ppisljar commented May 26, 2021

jasonrhodes commented May 26, 2021

ppisljar commented Sep 15, 2021

sorenlouv commented Sep 15, 2021 • edited Loading

jimczi commented Sep 15, 2021

sorenlouv commented Sep 15, 2021

jpountz commented Sep 15, 2021

sorenlouv commented Sep 15, 2021

axw commented Sep 16, 2021

jpountz commented Sep 16, 2021

ppisljar commented Sep 20, 2021

axw commented Sep 21, 2021

ppisljar commented Apr 12, 2022

jimczi commented Mar 26, 2021 •

edited

Loading

jimczi commented May 4, 2021 •

edited

Loading

jpountz commented May 14, 2021 •

edited

Loading

sorenlouv commented Sep 15, 2021 •

edited

Loading