Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lens] Field existence via 500 sample is not intuitive #58330

Closed
timroes opened this issue Feb 24, 2020 · 12 comments
Closed

[Lens] Field existence via 500 sample is not intuitive #58330

timroes opened this issue Feb 24, 2020 · 12 comments
Labels
discuss enhancement New value added to drive a business result Feature:Lens Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@timroes
Copy link
Contributor

timroes commented Feb 24, 2020

We currently only sample over the first 500 documents within the configured time range and filters (once #52826 is fixed). Those 500 documents might not be very representative over the overall documents matching this filters/queries/timerange, and thus a lot of available fields might be hidden.

We currently see a high amount of confusion among users around fields not appearing because of that. It's the most common question currently raised across all sources (forums, issues, twitter...).

We should discuss how we want to handle that in a less confusing way. I have a couple of suggestions what we could do to improve this situation:

  • Increase the sample size. I don't think this will actually help us much. We'll just increase query time to load the fields, and even if we go 10 times to 5000 documents, the dataset sizes might just be too small to get a meaningful sample. Gathering the true data is also just too expensive in general to do a proper terms aggregation.
  • If a user searches for fields we might also show them fields without data that are matching their data (at least if no other fields are matching), since I think a common try to solve that issues is for users to first search for the field.

I am not sure if we have better solutions, but I think given how often this issue pops up, we need to think about how we can create a better UX here.

Similar discussion: #40277

cc @cchaos

@timroes timroes added discuss enhancement New value added to drive a business result Team:Visualizations Visualization editors, elastic-charts and infrastructure Feature:Lens labels Feb 24, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app (Team:KibanaApp)

@cchaos
Copy link
Contributor

cchaos commented Feb 24, 2020

Have you thought about some sort of lazy loading where the fields list shows these first 500 documents, but then continues to query the rest adding as it gets more information?

@timroes
Copy link
Contributor Author

timroes commented Feb 24, 2020

A couple of other suggestions that came up:

  • Maybe renaming the "Filter by field type" label to open the popup to "Filter fields", since it's currently not clear from the label, that the popup dialog might contain that filter for showing fields without data.
  • Add a button or descriptive text at the end of the field list, stating that there are more fields without data hidden and give a "show more" button directly at the end of the list. So users scrolling through the full list searching their field, will directly at the end, when not finding their field see further actionable items to get to their fields.
  • As suggested by Caroline, we could do a bit more lazy loading of the field list. We could load the first 500 documents and have all the fields in there. After we have that information, we do an potential slower exist query on all the fields that are still hidden and show those fields later. Since we only need to do that on all fields we don't already know from the first 500 documents they have data, we might never even need those 2nd query for smaller datasets, or less sparse datasets, since they might already have all fields within the first 500 documents. My main concern about that approach is: how are we mixing in the newly loaded fields into the field list, without disturbing the users interaction, since they might right at that point work with the list, and we on the fly mix up the list (jumping content), which always creates a horrible UX.

@nreese
Copy link
Contributor

nreese commented Feb 24, 2020

Why even pull documents at all? Why not just load the field list from the index pattern saved object? Then, lazy load field details so that when a user hovers over a field for details, a terms aggregation or whatever is used to fetch the field details in a separate request.

@wylieconlon
Copy link
Contributor

@nreese Because we have the beats problem: metricbeat has 3900 individual fields in the default configuration, and obviously not all of those are used. So we want to provide the best possible list of fields as quickly as possible on first load and as filters are added.

@wylieconlon
Copy link
Contributor

wylieconlon commented Feb 24, 2020

To cross-link in some of the related discussion over time:

What would the ideal solution to this problem be? Would it require Elasticsearch support?

@wylieconlon
Copy link
Contributor

I couldn't find a discussion in the ES repo, so I added one elastic/elasticsearch#52730

@AlonaNadler
Copy link

Sampling more documents by default is not recommended. We anyway make multiple queries it wouldn't be good to increase it.

The scenario I find the most unsettled is when the preview comes empty and when dropped it shows data, it makes Lens seem unreliable. I understand it is due to sparse data, maybe we should try to optimize only this use case.

Regardless, I like @timroes suggestions

If a user searches for fields we might also show them fields without data that are matching their data (at least if no other fields are matching), since I think a common try to solve that issues is for users to the first search for the field.

@nreese
Copy link
Contributor

nreese commented Feb 25, 2020

It sounds like the problem is a result of a design decision to support a large field list that is sparsely populated. I would recommend not optimizing the experience for beats. I think the beats problem needs to be solved upstream of lens and then lens can optimize on just showing available fields in the index pattern and lazy load value details with aggregations to provide a view of the entire data set for a time range as needed vs querying for the first 500 documents and showing values for that poorly chosen sample.

@flash1293
Copy link
Contributor

As discussed with the Elasticsearch team, it might be possible to replace the current approach by either a multisearch or a filters aggregation, kicking off a separate search for the existence of each field. If this is justifiable from a performance and resource usage perspective, it would be a preferred solution because it will provide 100% accurate results (instead of the potential of false negatives as in the current solution).

We are going to explore this approach by creating a POC and testing it against common production configurations.

@ghudgins
Copy link
Contributor

POC didn't yield an implementation change so we're keeping this open. Need to continue to collaborate with Elasticsearch team on field_caps

@flash1293
Copy link
Contributor

Fixed by #112782 we don't use sampling anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss enhancement New value added to drive a business result Feature:Lens Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

No branches or pull requests

8 participants