-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support to get just populated fields from a Kibana index pattern #100779
Comments
Pinging @elastic/kibana-app-services (Team:AppServices) |
@timroes @flash1293 This sounds similar to what you're doing in discover and lens. We should verify that the needs are the same and find a shared solution. |
We just talked about this a bit and I think the index filter is a bit different in what it does. field caps index filter
sample documents
Suggestion (for places which use a form of document sampling today)Given these pros and cons of the approaches, I don't think simply switching over to field caps index filters instead of sampling documents is a viable approach because in very common real-world cases (there is just a single mapping and it contains much more fields than necessary) the outcome would be much worse. There is however additional information in the field caps index filter - whether or not it's even possible there is any data in fields. One option would be to do both - sample some documents and query the field caps API with an index filter to get three categories of fields:
The app could use these three categories to power the UI, e.g. in Lens:
|
Thanks for the detailed notes, @flash1293
Can you explain why that is? Is the filtering not respecting all the criteria (including date range)? |
@rayafratkina The field caps API is not checking individual documents for values - it operates on the mappings. This means if there is an index which includes a field in its mapping, field caps will report this field and Kibana will show it even if there isn't a single document which actually has a value indexed for this field (which means it's useless for most purposes). The "index filter" aspect is about only checking the mappings of indices which are known to have data for certain filters based on index level meta data. This is an optimization Elasticsearch uses to not query indices unnecessarily - e.g. in the index meta data the minimum and maximum date of any document in the index is stored, so it's possible to exclude indices (and the fields specified in their mappings) without looking at the data itself. The same is done for different datasets (e.g. separate indices for system metrics vs. apache metrics https://www.elastic.co/fr/blog/an-introduction-to-the-elastic-data-stream-naming-scheme), so in some cases it's possible to drastically reduce the number of fields relative to all fields in all mappings matching the whole index pattern. Coming back to your question, false positives can happen because the granularity of the filter is limited to indices instead of individual documents. But AFAIK it's also not possible to reliably exclude indices for all kinds of filters - date ranges and filters on constant keyword fields definitely work, I think most other types of filters are simply ignored in this case. @jimczi can definitely explain this better. |
resolved by #121367 |
Follow up to #78590 and #98259.
To reduce the amount of fields being passed on for large indices like filebeat to components like data grid, we implemented custom code to retrieve a random sample of documents and find out which fields are actually populated.
For example, for an out of the box metricbeat index, this reduces the list of passed on fields from 3000+ to ~120 fields.
This has both usability and "work-around" reasons. Some React components we consume (for example the data grid's dropdown to select visible columns) isn't well optimized to large number of field and slows down pages. Additionally, for indices with lots of fields there might be empty ones based on the use case. A user might have a hard time with try and error to select fields that actually contain data.
It would be great if Kibana index pattern could expose a method
getPopulatedFields()
that encapsulates functionality likes this.This feature is related to the discussion in #95558.
The text was updated successfully, but these errors were encountered: