Support to get just populated fields from a Kibana index pattern #100779

walterra · 2021-05-27T13:42:10Z

Follow up to #78590 and #98259.

To reduce the amount of fields being passed on for large indices like filebeat to components like data grid, we implemented custom code to retrieve a random sample of documents and find out which fields are actually populated.

For example, for an out of the box metricbeat index, this reduces the list of passed on fields from 3000+ to ~120 fields.

This has both usability and "work-around" reasons. Some React components we consume (for example the data grid's dropdown to select visible columns) isn't well optimized to large number of field and slows down pages. Additionally, for indices with lots of fields there might be empty ones based on the use case. A user might have a hard time with try and error to select fields that actually contain data.

It would be great if Kibana index pattern could expose a method getPopulatedFields() that encapsulates functionality likes this.

This feature is related to the discussion in #95558.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-05-27T13:43:04Z

Pinging @elastic/kibana-app-services (Team:AppServices)

mattkime · 2021-05-27T14:23:53Z

@timroes @flash1293 This sounds similar to what you're doing in discover and lens. We should verify that the needs are the same and find a shared solution.

flash1293 · 2021-05-27T14:49:30Z

We just talked about this a bit and I think the index filter is a bit different in what it does.

field caps index filter

No false-negatives (if field caps says a field doesn't exist, it's guaranteed to not show up in results)
false-positives (it's always possible a field is reported as available, but doesn't hold any data for the current time range)
works very well for the "default way" data is indexed in data streams (per index most fields always hold data)
doesn't work well for messy unorganized mappings which grew organically over time and contain obsolete definitions

sample documents

No false-positives (if there are values in the sample documents, there will be at least some results)
false-negatives (sample documents might not include fields, but others do and they would return data)
works well if documents are relatively homogenous (high chance to get a good sample)
doesn't work well for some special cases (e.g. just started ingesting a new field in a large index and only a few documents have it yet, but the user knows for sure there is some data)

Suggestion (for places which use a form of document sampling today)

Given these pros and cons of the approaches, I don't think simply switching over to field caps index filters instead of sampling documents is a viable approach because in very common real-world cases (there is just a single mapping and it contains much more fields than necessary) the outcome would be much worse.

There is however additional information in the field caps index filter - whether or not it's even possible there is any data in fields.

One option would be to do both - sample some documents and query the field caps API with an index filter to get three categories of fields:

Available - field caps and sample documents confirm these fields hold data
(Probably) empty - field caps reports this field as part of the mapping of the current indices, but there was no data in the sample documents
Definitely empty - field caps didn't include this in the index-filtered response - the field is in one of the mappings of the entire index pattern, but not in the indices selected by the current time range and filter

The app could use these three categories to power the UI, e.g. in Lens:

Available - show prominently
(Probably empty) - show de-emphasized (collapsed by default or sorted to the bottom of the list)
Definitely empty - don't show in fields list at all, but don't treat them being used in a config as error

rayafratkina · 2021-06-01T22:27:58Z

Thanks for the detailed notes, @flash1293
One clarification: you mentioned for field caps index filter

false-positives (it's always possible a field is reported as available, but doesn't hold any data for the current time range)

Can you explain why that is? Is the filtering not respecting all the criteria (including date range)?

flash1293 · 2021-06-02T08:11:36Z

@rayafratkina The field caps API is not checking individual documents for values - it operates on the mappings. This means if there is an index which includes a field in its mapping, field caps will report this field and Kibana will show it even if there isn't a single document which actually has a value indexed for this field (which means it's useless for most purposes).

The "index filter" aspect is about only checking the mappings of indices which are known to have data for certain filters based on index level meta data. This is an optimization Elasticsearch uses to not query indices unnecessarily - e.g. in the index meta data the minimum and maximum date of any document in the index is stored, so it's possible to exclude indices (and the fields specified in their mappings) without looking at the data itself. The same is done for different datasets (e.g. separate indices for system metrics vs. apache metrics https://www.elastic.co/fr/blog/an-introduction-to-the-elastic-data-stream-naming-scheme), so in some cases it's possible to drastically reduce the number of fields relative to all fields in all mappings matching the whole index pattern.

Coming back to your question, false positives can happen because the granularity of the filter is limited to indices instead of individual documents. But AFAIK it's also not possible to reliably exclude indices for all kinds of filters - date ranges and filters on constant keyword fields definitely work, I think most other types of filters are simply ignored in this case.

@jimczi can definitely explain this better.

ppisljar · 2022-04-12T08:09:40Z

resolved by #121367

botelastic bot added the needs-team Issues missing a team label label May 27, 2021

peteharverson added Feature:Data Views Data Views code and UI - index patterns before 8.0 and removed needs-team Issues missing a team label labels May 27, 2021

botelastic bot added the needs-team Issues missing a team label label May 27, 2021

peteharverson added enhancement New value added to drive a business result Team:AppServices labels May 27, 2021

botelastic bot removed the needs-team Issues missing a team label label May 27, 2021

exalate-issue-sync bot added impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort labels Jun 21, 2021

thomasneirynck mentioned this issue Feb 3, 2022

Add extra filters to field caps elastic/elasticsearch#82966

Closed

exalate-issue-sync bot added loe:medium Medium Level of Effort and removed loe:small Small Level of Effort labels Apr 4, 2022

ppisljar closed this as completed Apr 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support to get just populated fields from a Kibana index pattern #100779

Support to get just populated fields from a Kibana index pattern #100779

walterra commented May 27, 2021

elasticmachine commented May 27, 2021

mattkime commented May 27, 2021

flash1293 commented May 27, 2021 •

edited

Loading

rayafratkina commented Jun 1, 2021

flash1293 commented Jun 2, 2021 •

edited

Loading

ppisljar commented Apr 12, 2022

Support to get just populated fields from a Kibana index pattern #100779

Support to get just populated fields from a Kibana index pattern #100779

Comments

walterra commented May 27, 2021

elasticmachine commented May 27, 2021

mattkime commented May 27, 2021

flash1293 commented May 27, 2021 • edited Loading

field caps index filter

sample documents

Suggestion (for places which use a form of document sampling today)

rayafratkina commented Jun 1, 2021

flash1293 commented Jun 2, 2021 • edited Loading

ppisljar commented Apr 12, 2022

flash1293 commented May 27, 2021 •

edited

Loading

flash1293 commented Jun 2, 2021 •

edited

Loading