Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lens] Switch field list filtering to filtered field caps instead of sampling #112782

Closed
flash1293 opened this issue Sep 22, 2021 · 17 comments · Fixed by #122915
Closed

[Lens] Switch field list filtering to filtered field caps instead of sampling #112782

flash1293 opened this issue Sep 22, 2021 · 17 comments · Fixed by #122915
Assignees
Labels
enhancement New value added to drive a business result Feature:Lens Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@flash1293
Copy link
Contributor

Blocked on #95558

The current approach of field list filtering in Lens is problematic because false negatives can occur (see #58330). To fix this in a forward-facing way, it should be replaced by querying the field caps API with the currently applied filters.

As this can negatively affect the behavior of index structures which might still be used by some people, there should be an advanced setting flag to switch back to sampling (similar to how legacy chart implementations are treated). There should be telemetry for this setting - if there is no significant usage, sampling can be removed completely.

@flash1293 flash1293 added enhancement New value added to drive a business result Team:Visualizations Visualization editors, elastic-charts and infrastructure Feature:Lens labels Sep 22, 2021
@ghudgins
Copy link
Contributor

ghudgins commented Nov 2, 2021

can we verify if this implementation change will also effect the field dropdown (that has the same sections)? it would be really great if this changed both places as I have seen #58330 occur (false negatives) in the field list as well

@flash1293
Copy link
Contributor Author

@ghudgins yes, this is powered by the same underlying information

@mattkime
Copy link
Contributor

Perhaps we could provide dataView.getFilteredFields(Filter[])

@ppisljar @sixstringcode any thoughts

@flash1293
Copy link
Contributor Author

@mattkime I think this makes a lot of sense (and would be a great fit for our architecture) - just one small note: This function would also need to take time range and current query as arguments (or there needs to be some way to convert them into filters, not sure whether that's easily possible at the moment)

@mattkime
Copy link
Contributor

@flash1293 What format do you have the time range and current query in? Are you using the query bar for this? Looks like there's more flexibility than I initially thought - https://www.elastic.co/guide/en/elasticsearch/reference/master/search-field-caps.html#search-field-caps-api-request-body

At any rate, my goal is to properly pass through the query / filter. We should verify the details of how this should work.

@sixstringcode
Copy link

@mattkime we can move forward with the getFilteredFields change. Does this change apply to the field list in Discover also?

@flash1293
Copy link
Contributor Author

flash1293 commented Nov 24, 2021

@mattkime I’m also completely fine with passing a raw Elasticsearch query in there - it’s not an issue to compile filter plus timerange plus query as part of the consumer and it would keep the API flexible. In Lens state, timerange is a data “TimeRange” object, query is a data “Query” object (the ones returned by the query bar ui component)

@sixstringcode Discover can’t use this in the current state because the field list is built by the actual loaded documents - this makes it always “correct” in the sense of the doc table and the field list being in sync (this is the main problem we want to solve with this api in Lens)

@mattkime
Copy link
Contributor

I think the function signature would need to be a bit different from what I stated earlier. We'd add functionality to getFieldsForWildcard and create a helper function that takes the dataView and the query. Same final result.

Are we certain that all queries could be passed into the field caps call?

@flash1293 would passing the raw ES query have type checking?

@flash1293
Copy link
Contributor Author

Are we certain that all queries could be passed into the field caps call?

not sure what you mean, the Elasticsearch docs state a query object can be passed in (like the one used for search requests)

I don’t think we have proper type checking for ES query objects in Kibana. For consumers of the filter bar (like Lens), I think the following should be the easiest to use:
interface SearchContext {
query?: Query;
filter?: Filter[];
timeRange?: TimeRange
}

then the new function itself would put together the parts into a single Es query

@ppisljar
Copy link
Member

searchsource doesn't know about timerange (its just a filter for it), dataviews seem a similar level of abstraction, so i would be ok if we just pass in filters and queries

@flash1293
Copy link
Contributor Author

Sounds fine to me as well

@mattkime
Copy link
Contributor

@flash1293

I might be having trouble understanding the following statement from the docs -

query object Allows to filter indices if the provided query rewrites to match_none on every shard.

What if it doesn't rewrite to match_none?

@flash1293
Copy link
Contributor Author

flash1293 commented Nov 27, 2021

@mattkime Someone from the ES team can provide a more thorough explanation, this is my understanding:
The filtered field caps api is not actually applying the query object like a regular search request would (e.g. comparing it to the Lucene indices in the data shards) instead, it’s only checking the shard meta data whether the query has any chance of returning data. For a lot of things, it’s not possible to tell (e.g. whether a fuzzy search string will match), but it’s possible in two key cases:

  • if a range filter on the time field is out of bounds for all data stored in the shard
  • If a constant keyword field has a different value than the one specified in the query (constant keyword means just a single value is allowed)

in these cases the query can be broken down to “won’t match anything” per individual shard purely based on shard meta data without looking at the actual data. If this happens for all shards of an index, the index’s fields won’t become part of the final field caps response - this is what we are looking for for Lens. If a field is in the full field list of a data view but not in the filtered field list for the currently applied filters/query/timerange, they are “empty” and hidden from the UI by default, otherwise they are shown as “available”.

Based on this, to answer your question directly - if it doesn’t rewrite to match_none, it’s possible the shard contains data that will be included in the current query, so Elasticsearch has to take the fields of the index into account for the response of the fields caps api (marking them “available” in Lens)

@mattkime
Copy link
Contributor

mattkime commented Dec 6, 2021

@ppisljar Could you drive the filter / query portion of this interface? Or suggest someone else? Everything looks very straight forward but since I'm not familiar with that portion of the problem I might slow things down.

@mattkime
Copy link
Contributor

mattkime commented Dec 6, 2021

I spoke with @ppisljar and we came up with -

dataViews.getFieldsForWildcard and dataViews.getFieldsForIndexPattern can be extended to take filter: BoolQuery; The filter would be created with buildEsQuery by the dataViews API consumer. At least thats my first pass at it - I'm avoiding requiring kbn-es-query as part of the data views plugin as it adds 30k - at least it did last time I checked.

buildEsQuery is here - packages/kbn-es-query/src/es_query/build_es_query.ts

Anyway, it seems some of these details might changes but they'd be inconsequential. Does this sound good @flash1293 ?

@flash1293
Copy link
Contributor Author

@mattkime sounds good to me, this will be fine for our use case.

@flash1293
Copy link
Contributor Author

This isn't blocked anymore - #121367 gives us the required API.

We can call getFieldsForIndexPattern with the provided filter and use the response to filter down the regular field list from the data view.

This should be implemented on the server as part of the existing existing_fields endpoint switching between this API and the existing sampling approach based on a newly introduced advanced setting switch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:Lens Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants