Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate that dashboard filters run efficient elasticsearch queries #5003

Closed
tommyers-elastic opened this issue Jan 16, 2023 · 10 comments
Closed
Assignees

Comments

@tommyers-elastic
Copy link
Contributor

tommyers-elastic commented Jan 16, 2023

Filtering in dashboards can happen in several places, as detailed below. This issue exists to validate that the resulting elasticsearch queries are the most efficient they can be; these queries are opaque to the user, as well as package devs, but can be inspected in the UI.

For sections 1-3 below, the query structure is the same:

query": {
    "bool": {
      "filter": [
        {
          "match_phrase": {
            "cloud.account.id": "elastic-obs-integrations-dev"
          }
        }
      ],
      ...

For section 4, the query structure is slightly different:

  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "must": [],
            "filter": [
              {
                "bool": {
                  "should": [
                    {
                      "match_phrase": {
                        "agent.type": "metricbeat"
                      }
                    }
                  ],
                  "minimum_should_match": 1
                }
              }
            ],
           ...
  1. Dashboard-level controls panels

Screenshot 2023-01-16 at 09 55 59

2. Dashboard-level query filters

Screenshot 2023-01-16 at 09 55 37

  1. Panel-level query filters

Screenshot 2023-01-16 at 09 54 57

  1. Panel-level options filter (for non-lens visualisations)

Screenshot 2023-01-16 at 09 52 48

In addition, here is an example of a full request for a panel generating CPU metrics, filtered by datastream.

Show
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "@timestamp": {
              "gte": "2023-01-16T00:00:00.000Z",
              "lte": "2023-01-16T23:59:59.999Z",
              "format": "strict_date_optional_time"
            }
          }
        }
      ],
      "filter": [
        {
          "match_phrase": {
            "data_stream.dataset": "gcp.compute"
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  },
  "aggs": {
    "61ca57f1-469d-11e7-af02-69e470af7417": {
      "terms": {
        "field": "cloud.instance.name",
        "order": {
          "61ca57f2-469d-11e7-af02-69e470af7417-SORT": "desc"
        }
      },
      "aggs": {
        "61ca57f2-469d-11e7-af02-69e470af7417-SORT": {
          "avg": {
            "field": "gcp.compute.instance.cpu.usage.pct"
          }
        },
        "timeseries": {
          "date_histogram": {
            "field": "@timestamp",
            "min_doc_count": 0,
            "time_zone": "Europe/London",
            "extended_bounds": {
              "min": 1673827200000,
              "max": 1673913599999
            },
            "fixed_interval": "5m"
          },
          "aggs": {
            "61ca57f2-469d-11e7-af02-69e470af7417": {
              "avg": {
                "field": "gcp.compute.instance.cpu.usage.pct"
              }
            }
          }
        }
      },
      "meta": {
        "timeField": "@timestamp",
        "panelId": "61ca57f0-469d-11e7-af02-69e470af7417",
        "seriesId": "61ca57f1-469d-11e7-af02-69e470af7417",
        "intervalString": "5m",
        "indexPatternString": "metrics-*"
      }
    }
  },
  "runtime_mappings": {}
}
@ruflin
Copy link
Member

ruflin commented Jan 17, 2023

@jpountz It would be great to get someone from the Elasticsearch team to take a quick look at these queries. The reason I'm bringing this up is that I want to make sure that the queries built by Kibana are internally by Elasticsearch converted to efficient queries using data_stream.dataset pre filtering.

For example there is a trip bool nesting with match_phrase which I would manually not necessarily write like this. Same for match_phrase which I assume should be fine because of elastic/elasticsearch#85165 (thanks for the pointer).

@ruflin
Copy link
Member

ruflin commented Feb 20, 2023

@dakrone Could someone in your team take a quick look at this?  🤦 Sorry for the wrong ping

@martijnvg Could someone in your team take a quick look at this?

@martijnvg
Copy link
Member

I personally wouldn't use the match_phrase to filter on a constant keyword field. But it does what is expected and is able to rewrite to a match all docs or a match no docs query just by looking at the mapping.

The change I would make to this query, is to move the range query from the must clause to the filter clause. I don't think query time boost / scoring is needed here, since requested size is 0. This would make the range query eligible for the query cache and subsequent usages of this range query instance in different search request could make use of a cached result.

@ruflin
Copy link
Member

ruflin commented Feb 22, 2023

@martijnvg As the queries are generated by Lens / Kibana, we don't have control about it. Pulling in @drewdaemon : Does this fit in your area or who should we ping on the Kibana side to get this "reviewed"? @martijnvg Ideally you would work directly with Kibana on this as it will affect all users that build queries in Kibana.

@martijnvg
Copy link
Member

martijnvg commented Feb 22, 2023

I like to mention that in case when no scoring is required, it is always better to use the filter clause of a boolean query instead of the must clause.

@drewdaemon
Copy link
Contributor

@mattkime, @lukasolson, could one of you weigh in here?

@lukasolson
Copy link
Member

From the examples shown in the description of the issue, I don't see any queries that are placed in a should (or must) clause that aren't enclosed in a higher level filter clause.

The change I would make to this query, is to move the range query from the must clause to the filter clause.

@martijnvg Which query are you referring to? I don't see an example given where the range query isn't inside a filter clause.

@martijnvg
Copy link
Member

@lukasolson I was referring to the collapsed search request at the end of the issue description. It has a range query in a must clause which could be moved to the filter clause.

Regarding the range query, how are shorter periods like 15 minutes defined? Are those periods rounded and if so how?

@lukasolson
Copy link
Member

lukasolson commented Mar 6, 2023

Ah, that makes sense. I've added that to our meta issue regarding query performance here: elastic/kibana#101041

Right now we aren't doing any rounding with the date ranges. If you select a period of "last 15 minutes" then we will convert that to an absolute time range on the browser (not send it as "now-15" or anything like that). We have an issue related to rounding here: elastic/kibana#94280

@ruflin
Copy link
Member

ruflin commented Mar 28, 2023

@lukasolson The issue elastic/kibana#101041 refers to Discover, but I assume all these optimisations would also be available to visualisations? Because that is where this issue initially comes from.

@martijnvg I plan to close this issue as the goal was to get the discussion started around performance of dashboards. I expect you and the team to keep watching this and potentially pushing forward. Reason is if Elasticsearch makes improvements on the query speed but the benefits are not used in Kibana, it will not be available to Elasticsearch.

@ruflin ruflin closed this as completed Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants