-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply context filters when retrieving list of fields in application #95558
Comments
Pinging @elastic/kibana-app-services (Team:AppServices) |
I think the next step for this would be to design a specific UX that would benefit from this enhancement. Which solution? @ruflin You spoke up on the ES issue, I'm curious if you have any ideas. |
++ on what @jimczi described. On my end I had initially mainly Dashboards in mind that a dashboard could define its prefilters for fieldcaps and looking at a mysql dashboard, only the related mysql fields are suggested. On the solution side a few ideas:
|
Yes! I'd love that. We could really need that in Observability for the search bar that currently suggests many irrelevant fields. I think that would also mostly solve #94879 |
cc @peluja1012 @XavierM - this could be useful for future iterations of the global search bar in the Security app. We could restrict some field suggestions.
This would also be helpful for us in the Security UI, however I think we'd still like to "map" the suggestions to a more human readable format instead of always showing the fields as the appear in the document. What @andrewvc mentions here: #94879 (comment) is still relevant and should also be solved. |
Could a solutions team drive the planning for this effort? This needs a thorough UX story - How does the user set the filter? (either directly or indirectly via an existing control) Do these filters persist between Kibana apps? Perhaps there are other questions we should be asking but those are the first that come to mind. From there I'd need to figure out what is needed from the index patterns api. |
++ to @kevinlog , it doesn't hit the usability threshold we need. While I think this proposal is useful it doesn't solve #94879 , though it might be a step in the right direction. What that issue calls for is a human curated list of fields that are useful. If Elasticsearch mappings had the ability to tag fields as 'friendly' or 'high priority', that would be more helpful, though I'm not convinced that's the right approach. That would look something like what is below:
The point is we just want to show the most frequently used fields. In any given solution there are lots of internal fields that are used for various purposes but rarely queried by users. We should not suggest these. The real question is where do we do this work. Do we do it in ES? In Kibana? If Kibana, do we do it in an Index Pattern? Something else on top of one? My $0.02 is that it's better to do it in Kibana and bundle a field list with each solution or package. I'd just make this a feature of the Kuery Bar UI component. I'm on vacation starting next week lasting through mid-month, but I'd be glad to move this forward when I return, though I won't have time before this. |
Index patterns already has this via the field 'count' property. Discover is the only app that makes use of it but it certainly could be expanded. If a particular solution creates index patterns, count values for the various fields could be preset. In the UI, this value is labeled 'popularity' |
@mattkime I realize the word 'frequently' did not convey my intent accurately. I meant it to mean the most useful fields, not the ones that were literally the most accessed. Let's think about users who don't know every field in the Uptime or APM schemas. They don't want a field named This list needs to be manually curated, not algorithmically generated. This whole problem is analogous to the situation between the new Exploratory View and Lens. Lens is a power tool with full access to the schema, and assumes the user is comfortable with, or willing to learn the under the hood schema. The reason the exploratory view is so powerful is it presents users with commonly used fields exclusively, and doesn't use complex schema dot notation, rather favoring friendly names, and only showing a small number of the most important fields. |
That said, this specific ES feature is probably still a nice win for the Kuery bar in the discover app, and may be part of a larger approach for solutions. (IMHO) |
I'm not sure the manually-curated solution @andrewvc is describing would ever work for us in the Metrics UI (and I'm not sure about the Logs UI, either), largely because so much of what a user will be interacting with will be dynamic on some level. So even if we are able to tell in advance which are the most useful fields a user may want to query on, knowing if those fields are even present in the data (and in the time range being queried) would be incredibly useful to us. @simianhacker has taken several stabs at trying to do this for us in the Metrics UI to no avail. It would be fantastic if a field's definition somehow included the human readable details alongside its mapping (this feels like we're veering back into all of the field customization from Kibana Index Patterns?) so that when we detect the existence of |
Elasticsearch supports metadata for each field mapping. It would be possible to add this description there. I remember when we introduced this @jpountz mentioned to not encourage "random" data there as it might explode the template size. But maybe something we should look into. Lets assume for a moment it is in the template. Kibana could directly read it from there. It would also allow that for certain fields like |
Cluster state storage isn't free so we are careful with how we use it. We have plans on the roadmap to deduplicate mappings on data streams, so if you plan on moving forward with things like that, please let us know ahead of time so that we can prioritize accordingly. I'd need to think more about whether field descriptions would be a good fit for metadata. On the one hand it feels ok because a field's description is metadata about a field, but on the other hand our current features wouldn't allow setting field descriptions on dynamically mapped fields. I'm also unsure how we'd handle i18n, should we store the field's description in all supported languages in the metadata? This would make mappings very hard to read. Filtering the list of fields so that it only shows those that are relevant to the data feels very useful, let's move the discussion about field descriptions to a separate issue? Regarding relevancy, one idea that has been floating around would be the ability to store telemetry about a cluster's usage within a cluster for our users' purposes. For instance storing information about field access could help provide users with better field suggestions, but we could also leverage this information to make recommendations about which fields it would make sense to move to runtime fields in order to save space, and I'm sure we'll find other use-cases as we think more about it. |
The description of a field could become quite extensive in some scenarios. So pushing all this to the cluster state does not seem like the ideal place. I'm wondering if there are other options where we could store "meta" information about a field which does not have to end up in the cluster state. Usually this information is not required during query time but is retrieved one by one when a user wants to get more information about the field. Specifying it in the template itself would still be convenient ... |
We're really edging back towards storing this info in the Data Views née (Kibana) Index Patterns, eh? If we can solve the sync/caching issue, maybe it's not the worst idea? |
I think there is an issue here with storing it in index patterns. Take my example above with |
This issue derailed a bit into how to store additional information about fields and other discussions. What @jimczi described initial is much simpler and is about exposing all the benefits we get from the data stream naming scheme in Kibana for field caps. The benefits for the users are that they only see relevant fields (older related issue: #24709) and much better performance. Elasticsearch already has all the required features, Kibana should adopt it. |
@ruflin yeah, that sounds good. Is there a simple example of how to implement this from the Kibana side? I'd love to test this out in Metrics/Logs... |
@jasonrhodes , you'd need to call the field capabilities API augmented with the active filters:
Currently Kibana calls this API without any context (no index_filter) even if filters are defined. |
@jimczi I'm digging in to make sure I have a complete understanding of the basics for this effort. For the most part, the idea of applying filter criteria from the search bar to field lists is pretty straight forward. However, you mention |
@mattkime I don't think that the goal is to expose constant keywords to users. I believe that constant keywords were mentioned because they work especially well with this
In that specific case, Elasticsearch will automatically ignore old indices as well as indices that don't have a |
reading all above this is what i propose:
i have one question still: aren't we putting a lot of effort to work around the original problem we have? we are misusing index patterns in metrics UI and some other solutions to store too many fields. we should rather have multiple index patterns for each of the data type they are storing. |
@ppisljar are you referring to the metricbeat-* index mapping? It's unfortunately rather set in stone, but data streams are the solution to that. We're quite some ways away from when all customers will be storing all data in more segmented data streams, though, so until then this will help a lot. |
yes i am referring to the metricbeat-* index mapping. It seems you use multiple dataviews in elasticsearch with dense fields and well defined scope, but you fail to transfer that in kibana with a single mapping that just matches everything. Also i agree with @mattkime , this needs a good user story and UX. I just can't imagine how exactly do we make use of this ? (for example you are looking at a dashboard, you want to add a filter to it. you expect field list to contain less fields. but which ones really ? (your dashboard contains visualizations from dozen different data views, could have fixed (alternative) time ranges defined for specific panels etc etc. So i just can't imagine what exactly should we filter on in such cases. |
I can contribute with a use case from APM. APM transactions are stored in indicies like A user can have (micro)services in different programming languages. Transactions from these will be ingested to the same indicies (aka not an index per service), so services will share the same mapping. This causes field suggestions to "bleed" over to services where they are not relevant. For instance, In this case, selecting a For field suggestions we don't have this problem because we filter the suggestions with a terms agg to only get relevant values. In the following (overly) simple example we only expect to see Similarly on the service overview we expect to see suggestions for all services: It would be AWESOME If we could have the same filtering mechanism for field suggestions like we have for value suggestions, so only relevant fields show up for the selected service. |
Why is this set in stone ? Why not using one index per language ? If the mapping of these languages is distinct it would be beneficial to separate them in their own indices.
Implementing value suggestions through an aggregation is a bad practice on time-based indices. It is not precise for a big cost in terms of performance and latency. Value and by extension field suggestion needs to be fast so we cannot expect the same flexibility than we have at the query DSL. The new terms_enum API was added to replace this bad habit of using a plain aggregation to get suggestions. We need a better mechanism and that needs to start with the design. If you have a single data stream with different services that don't share the same fields, the recommendation is to split into multiple indices. |
We did initially consider having a datastream per service but since each service already have 4 data streams (logs, metrics, errors, traces) this would result in 100s maybe even 1000s of datastreams for customers with many services. |
I was involved in these discussions and my recollection is that there was a bit more nuance. We did indeed advise against using different data streams per service to avoid index/shard explosion, however we still think that we should split data streams that would have different mappings. So ideally services should be grouped together depending on whether they would have very similar mappings or not. (There is still some nuance there, e.g. if 1,000 fields are common and only 2 fields differ, maybe it's still a better trade-off to put data into the same data stream to keep the number of data streams under control. The Elasticsearch team is happy to be consulted when cases like that arise.) In an ideal world, APM would be able to have different granularities for each type of data that it records, e.g. maybe there could be a single data stream for internal metrics since all services have the same mappings for their internal metrics while there would be multiple data streams for traces as we would group services that have the same mappings for traces together. (I can certainly appreciate how it makes the architecture more complex.) |
Okay, this sounds interesting. Something that we could consider is to split data streams by APM agent (so separate data streams for python, node, dotnet etc). |
That's what we've ended up doing. We produce multiple data streams, most of which are not service-specific. There is one service-specific data stream which contains custom application metrics and runtime/language-specific metrics. We should be able to add a _field_caps filter for that data stream. @sqren I'll point you at more details offline |
Wonderful! |
Why is field_caps filter still needed ? If you will split up your data streams so they dont have 1000s of fields, is this still an issue ? |
@ppisljar taking the specific example that @sqren shared above: For one of the data streams, there are still service-specific fields like |
resolved by #121367 |
In elastic/elasticsearch#56195 we added the ability to filter the field capabilities output with an index filter.
The idea is that
_field_caps
can dynamically retrieve a list of fields that are relevant for the context.If a user in Discover has a range filtering the last 3 days, we should restrict the list of available fields for suggestions to the ones that appear in the range.
Our Observability solution also uses
constant_keyword
to differentiate data streams so applying the current filters to the_field_caps
call could limit the fields to a specificmetricset
likecpu
for instance.This change is important for our Solutions in order to have a way to limit the number of fields in large index patterns when the context is narrowed (by a range filter or a filter on the dataset).
Today the list of fields of an Index Pattern is retrieved once on each app through
field_caps
without taking the context into account.We should make it even more dynamic and apply the context of the apps more eagerly when possible (changing the range filter should update the list of available fields).
The text was updated successfully, but these errors were encountered: