Filter bar value suggestions - filter by time #15887

shaharmor · 2018-01-08T09:00:42Z

Kibana version: 6.0

Elasticsearch version: 6.0

Description of the problem including expected versus actual behavior:
When using the filter value suggestions feature, the query that Kibana is making is not filtered by the time filter of the dashboard itself, which can cause a big in ES.

There are some limitations in place to help mitigate it, but I see no reason to query indices that are out of the range of the dashboard.
In fact, I'm pretty sure it can harm because it might show field values that don't have any data for the selected range

Steps to reproduce:

Enable filter suggestions
Add a filter
Observe the suggestions query

Edit by @lizozom: This will be available and enabled by default starting 7.15.0, however, due to #100174, you could also consider turning this off (Advanced Settings > autocomplete:useTimeRange) and get autocomplete suggestions from your entire data set with much smaller performance implications.

The text was updated successfully, but these errors were encountered:

Bargs · 2018-01-08T21:15:47Z

I see no reason to query indices that are out of the range of the dashboard.

We chose to do this intentionally. Just because a field value isn't in the current result set doesn't always mean a user isn't interested in filtering on it. Finding out that no docs match a filter in a certain time range might be just as important finding out that docs do match.

We made a number of optimizations to the suggestions request so that it should be performant enough. Are you experiencing actual issues related to these requests? If so, we can see if there are any further optimizations that might help your case.

We could add an advanced option allowing admins to choose whether the time filter applies to these suggestions, but I'd prefer to leave that as a last resort. What do you think @lukasolson ?

lukasolson · 2018-01-08T21:52:50Z

Just as @Bargs said, this was an intentional decision. I'd also prefer not to introduce an additional advanced setting for this. I'd be interested to hear more of the reasoning behind wanting to limit the query... Is it performance related, or related to the accuracy of the results?

shaharmor · 2018-01-09T10:41:26Z

It is related to both performance & accuracy :)

We have a cluster with daily indices, and when ever a user starts to filter the query is being run over around 4000 shards, with more added every day.
As most of the users only view the last few days at max (And we also limit it in the UI), making that 4k shards query is enormous compared to what they would do if they would query only the time range they are looking in to.
I think that a time filter (In all Kibana queries) is a must when handling large clusters with tons of indices & shards.

Our data is based on sports events, and the bigger the event the more docs you have during that event time.
When users add a filter and view only the last 24h for example, its possible that in the past there was an much bigger event, that had a lot more traffic, and that takes precedence over the other events the occurred during that 24h range, making the filter not usable for them.

Eventually only 10 results are returned, which might not be what the user is looking for.

In fact, now that I think of it, I think that the dashboard filters/query should also be added to the suggestions query.

shaharmor · 2018-01-09T11:14:33Z

Another thing is I don't understand why you are setting execution_hint: 'map', if the regex is always a match all regex. (And you can't pass anything else).

Wouldn't it be much faster to remove both the include & execution_hint options?

jccq · 2018-01-09T14:37:07Z

a checkbox to "use current filters" in the suggestion would be pretty cool & useful

Bargs · 2018-01-09T22:29:57Z

Another thing is I don't understand why you are setting execution_hint: 'map', if the regex is always a match all regex. (And you can't pass anything else).

It's not always a match all, we use what the user has typed into the box to filter down the results. In the example you provided above where a user doesn't see the value they want in the first 10 results, I'd expect them to start typing in part of the value to get more targeted suggestions, just like any autocomplete implementation. The opposite scenario doesn't work as well. If we start with the time range applied and the result set is too narrow, the user has no way to get more suggestions other than expanding the range in the time picker which won't be at all obvious to most people. This is why I'd prefer to cast a net that's too wide rather than too narrow.

As for performance, could you collect some timings of the suggestions request with and without a date range applied? You should be able to do that in Console. I'd be surprised if there's a big difference since we're using terminate_after, but my intuition could be wrong.

shaharmor · 2018-01-10T08:15:57Z

You are right about the "include" part, missed that its also used in the html of the filter suggestions, was looking only on the initial run.
I understand what you're saying about "hiding" some values.
Have you tried to ask Kibana users if they need it? Or if they'd prefer the performance advantages of having a time filter? Anyway I think it should be left for the user to decide and not have it decided for them.

Regarding the performance benchmark:
TLDR:
Cache was cleared after every run, and each test was run for 5 times. Results were pretty much the same.
Without a time filter, query takes ~20s, which is obviously way too long for someone that wants to see values for filtering.
With a time filter set to 24h, query takes 1s, much better :)

Without a time filter the filter bar suggestions is useless (At least in our case).

Here are the detailed benchmark results:

Querying for the dimensions.city.name field without any time filter:
Query:

{
  "size": 0,
  "timeout": "1s",
  "terminate_after": 100000,
  "aggs": {
    "suggestions": {
      "terms": {
        "field": "dimensions.city.name",
        "include": ".*",
        "execution_hint": "map",
        "shard_size": 10
      }
    }
  }
}

Results:

{
  "took": 20079,
  "timed_out": false,
  "terminated_early": true,
  "num_reduce_phases": 12,
  "_shards": {
    "total": 5658,
    "successful": 5658,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 565800000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "suggestions": {
      "doc_count_error_upper_bound": 3664931,
      "sum_other_doc_count": 429597084,
      "buckets": [
        {
          "key": "N/A",
          "doc_count": 96146685
        },
        {
          "key": "Bucharest",
          "doc_count": 9687895
        },
        {
          "key": "Lima",
          "doc_count": 5284617
        },
        {
          "key": "Bogotá",
          "doc_count": 4655048
        },
        {
          "key": "Santiago",
          "doc_count": 4522195
        },
        {
          "key": "Mexico City",
          "doc_count": 4113729
        },
        {
          "key": "Istanbul",
          "doc_count": 3868593
        },
        {
          "key": "Buenos Aires",
          "doc_count": 2794301
        },
        {
          "key": "Paris",
          "doc_count": 2735908
        },
        {
          "key": "Stockholm",
          "doc_count": 2393945
        }
      ]
    }
  }
}

With a 24h time filter:
Query:

{
  "size": 0,
  "timeout": "1s",
  "terminate_after": 100000,
  "aggs": {
    "suggestions": {
      "terms": {
        "field": "dimensions.city.name",
        "include": ".*",
        "execution_hint": "map",
        "shard_size": 10
      }
    }
  },
  "query": {
    "bool": {
      "filter": [{
        "range": {
          "@timestamp": {
            "gte": 1514764800000,
            "lte": 1514851200000,
            "format": "epoch_millis"
          }
        }
      }]
    }
  }
}

Results:

{
  "took": 1143,
  "timed_out": false,
  "terminated_early": true,
  "_shards": {
    "total": 5658,
    "successful": 5658,
    "skipped": 5514,
    "failed": 0
  },
  "hits": {
    "total": 14400000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "suggestions": {
      "doc_count_error_upper_bound": 83093,
      "sum_other_doc_count": 11012179,
      "buckets": [
        {
          "key": "N/A",
          "doc_count": 2471314
        },
        {
          "key": "Bucharest",
          "doc_count": 252441
        },
        {
          "key": "Istanbul",
          "doc_count": 119272
        },
        {
          "key": "Stockholm",
          "doc_count": 106234
        },
        {
          "key": "Paris",
          "doc_count": 86609
        },
        {
          "key": "Lima",
          "doc_count": 82574
        },
        {
          "key": "Mexico City",
          "doc_count": 78821
        },
        {
          "key": "Hyderabad",
          "doc_count": 70036
        },
        {
          "key": "Bogotá",
          "doc_count": 60564
        },
        {
          "key": "Santiago",
          "doc_count": 59956
        }
      ]
    }
  }
}

trevan · 2018-01-10T15:31:17Z

We've patched Kibana to add the timespan to the suggestions queries. Both because it does make it a bit faster for our use case and that when you are filtering, it makes more sense to only show users what will actually cause data to be filtered. Showing values outside of the existing timespan just means that they get no results and is confusing to them since the filter was auto-suggested.

I like the idea of @shaharmor where the existing filter/query should also be used as well to give even more targeted suggestions.

jccq · 2018-01-11T06:53:54Z

@Bargs having a checkbox in that filter (preset on) "use current filters and time selection" would provide the benefits to both audiences while maintaining clarity.

shaharmor · 2018-01-11T07:16:38Z

It should also be configurable at the Kibana level for all filters at once

lukasolson · 2018-01-11T20:13:49Z

"took": 20079,
"timed_out": false,
"terminated_early": true,

Hmm... Interesting that even though the request includes "timeout": "1s", the request isn't timing out after 1 second. @jpountz Any idea why this would be happening? Maybe some combination of using timeout with terminate_after?

jpountz · 2018-01-15T14:18:26Z

It's very hard to configure a query so that latency is below a given threshold, so Elasticsearch only stops processing more documents after the timeout is expired. However, there are a couple steps that still need to be performed after matches have been processed:

build a representation of the shard-level results
send those results back to the coordinating node
merge those results together

This would suggest these steps take close to 19 seconds. execution_hint: map should ensure that 1 is reasonably fast, 2 should be fast on a local network, so I suspect the problem is with 3. This is not too surprising there are problems given the number of queried shards, but I wouldn't have expected such a long time for merging results. Maybe there is some memory pressure as well?

Is the response time consistently reproducible? If yes, could we try to capture hot threads a couple times while the query is running?

shaharmor · 2018-01-15T14:49:25Z

@jpountz per your request:

I re-ran the same test again, same results: 17s - 19s response time with no cache.

I ran the query and while it was running I ran the hot_threads command 3 times, each with threads=10&interval=3000ms.

There are 3 servers that hold the shards in question, so the hot_threads command was run with a filter on those 3 servers alone. (Each hot_threads log contains all 3 servers)

If there are more details you need let me know.

Here are the results:
1st hot_threads: https://pastebin.com/W2X3Tuju
2nd: https://pastebin.com/AFMcBpCP
3rd: https://pastebin.com/2v3kL3v0

jpountz · 2018-03-29T17:06:52Z

Argh I had lost track of this discussion and now the pastebins have expired. Sorry for that. @shaharmor Do you still have them by any chance?

shaharmor · 2018-03-31T18:03:53Z

Unfortunately no, but I will try to run it again

tylersmalley · 2019-09-05T21:35:20Z

There is another instance of a user having issues with the pressure created by the KQL queries here: https://discuss.elastic.co/t/kql-value-suggestions-are-killing-my-cluster/196556/7

@lukasolson @Bargs @stacey-gammon it seems like it might be worth bringing this discussion up again since KQL is now the default. Trevan raised a good point about limited the results to the current timespan as it would otherwise result in no results. In addition, the inability to scale due to the auto-complete query hitting every shard is concerning.

lukasolson · 2019-09-05T22:23:07Z

Seeing as how we now allow configuring things like terminate_after I don't think it'd be a bad idea to add another advanced setting to allow filtering the suggestions by existing filters (time filter + other existing filters).

elasticmachine · 2020-02-20T15:50:34Z

Pinging @elastic/kibana-app-arch (Team:AppArch)

fbaligand · 2020-06-18T08:56:33Z

I'm very interested by this feature!
Currently, it causes some frustration, because we click on a suggested value and get "no results".

lukasolson · 2021-03-03T19:29:44Z

I believe this is resolved by #81515.

lizozom · 2021-03-04T14:59:12Z

@lukasolson I actually changed it for the suggestions in the search bar, but not for the filters.

Bargs added release_note:enhancement discuss :Discovery Feature:Filters labels Jan 8, 2018

Bargs mentioned this issue Apr 26, 2018

Kuery enhancements #17795

Closed

timroes added Team:Visualizations Visualization editors, elastic-charts and infrastructure and removed :Discovery labels Sep 16, 2018

lukasolson added the Feature:KQL KQL label Oct 28, 2019

timroes added Team:AppArch and removed Team:Visualizations Visualization editors, elastic-charts and infrastructure labels Feb 20, 2020

alexh97 assigned lukasolson Mar 9, 2020

lukasolson assigned lizozom and unassigned lukasolson Oct 30, 2020

lizozom mentioned this issue Nov 2, 2020

[Autocomplete] Support useTimeFilter option #81515

Merged

lukasolson closed this as completed Mar 3, 2021

lukasolson reopened this Mar 11, 2021

Dosant added the EnableJiraSync label May 11, 2021

exalate-issue-sync bot added impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort labels May 13, 2021

lizozom mentioned this issue Jul 20, 2021

Autocomplete for search ( and filter panel) is not working for cross cluster search index patterns #104515

Closed

lizozom mentioned this issue Jul 29, 2021

filter FilterBar suggestions by time (according to flag) #107192

Merged

9 tasks

lizozom closed this as completed in #107192 Aug 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter bar value suggestions - filter by time #15887

Filter bar value suggestions - filter by time #15887

shaharmor commented Jan 8, 2018 •

edited by lizozom

Loading

Bargs commented Jan 8, 2018

lukasolson commented Jan 8, 2018

shaharmor commented Jan 9, 2018

shaharmor commented Jan 9, 2018

jccq commented Jan 9, 2018

Bargs commented Jan 9, 2018

shaharmor commented Jan 10, 2018 •

edited

Loading

trevan commented Jan 10, 2018

jccq commented Jan 11, 2018

shaharmor commented Jan 11, 2018

lukasolson commented Jan 11, 2018

jpountz commented Jan 15, 2018

shaharmor commented Jan 15, 2018 •

edited

Loading

jpountz commented Mar 29, 2018

shaharmor commented Mar 31, 2018

tylersmalley commented Sep 5, 2019

lukasolson commented Sep 5, 2019

elasticmachine commented Feb 20, 2020

fbaligand commented Jun 18, 2020

lukasolson commented Mar 3, 2021

lizozom commented Mar 4, 2021

Filter bar value suggestions - filter by time #15887

Filter bar value suggestions - filter by time #15887

Comments

shaharmor commented Jan 8, 2018 • edited by lizozom Loading

Bargs commented Jan 8, 2018

lukasolson commented Jan 8, 2018

shaharmor commented Jan 9, 2018

shaharmor commented Jan 9, 2018

jccq commented Jan 9, 2018

Bargs commented Jan 9, 2018

shaharmor commented Jan 10, 2018 • edited Loading

trevan commented Jan 10, 2018

jccq commented Jan 11, 2018

shaharmor commented Jan 11, 2018

lukasolson commented Jan 11, 2018

jpountz commented Jan 15, 2018

shaharmor commented Jan 15, 2018 • edited Loading

jpountz commented Mar 29, 2018

shaharmor commented Mar 31, 2018

tylersmalley commented Sep 5, 2019

lukasolson commented Sep 5, 2019

elasticmachine commented Feb 20, 2020

fbaligand commented Jun 18, 2020

lukasolson commented Mar 3, 2021

lizozom commented Mar 4, 2021

shaharmor commented Jan 8, 2018 •

edited by lizozom

Loading

shaharmor commented Jan 10, 2018 •

edited

Loading

shaharmor commented Jan 15, 2018 •

edited

Loading