Improve discover query #69049

nik9000 · 2020-06-12T17:48:54Z

I bumped into a _search generated by discover that had a few things in it that looked like they'd slow elasticsearch down. I'm wondering if we can do anything to speed this up:

curl -XPOST -HContent-Type:application/json ????????   -d'{
  "version": true,  <--- do we really need this?
  "size": 500,      <--- this is fairly large. too big to fit on the screen, right?
  "sort": [
    {
      "@timestamp": {
        "order": "desc",
        "unmapped_type": "boolean" <---- wat
      }
    }
  ],
  "aggs": {    <---- having an agg in the same query as `size` turns off agg caching and doesn't let the early terminate fetching the top hits
    "2": {
      "date_histogram": {
        "field": "@timestamp",
        "fixed_interval": "30m",
        "time_zone": "America/Chicago",
        "min_doc_count": 1
      }
    }
  },
  "stored_fields": [  <---- stored_fields should be pretty rare. I'd expect leaving this off would mostly produce all you need and adding it will fetch more than you need.
    "*"
  ],
  "script_fields": {},
  "docvalue_fields": [   <----- that is a lot of fields. doc_values are a column store so aren't going to be efficient to fetch. I guess you do this to get a formatted date. https://github.com/elastic/elasticsearch/issues/55363 will help with that.
    {
      "field": "@timestamp",
      "format": "date_time"
    },
    .... 11 other date_time fields
  ],
  "_source": {   <----- this is confusing. I think it means "don't filter" but I'd have to look it up. It's way, way less confusing to leave this out if you don't need any filtering.
    "excludes": []
  },
  "query": {
    "bool": {   <----- This looks big but it is 100% ok. We'll rewrite it down to just the range query.
      "must": [],
      "filter": [
        {
          "match_all": {}
        },
        {
          "range": {
            "@timestamp": {
              "gte": "2020-06-08T05:00:00.000Z",
              "lte": "2020-06-09T02:06:38.662Z",
              "format": "strict_date_optional_time"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  },
  "highlight": {
    "pre_tags": [
      "@kibana-highlighted-field@"
    ],
    "post_tags": [
      "@/kibana-highlighted-field@"
    ],
    "fields": {
      "*": {}       <--- this is expensive. There are 100 fields in this index. In this particular case the search query may be too simple to highlight. I'm not sure. But I am certain that if the query *isn't* super simple then this will be expensive.
    },
    "fragment_size": 2147483647 <---- this is asking ES to OOM if there are large documents.
  }
}

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-06-12T17:48:56Z

Pinging @elastic/kibana-app (Team:KibanaApp)

nik9000 · 2020-06-15T17:08:13Z

The doc_values fields I pointed out duplicates #68672. But the other things are still their own unique "fun" bits of search.

kertal · 2020-06-15T22:04:56Z

About

"aggs": {    <---- having an agg in the same query as `size` turns off agg caching and doesn't let the early terminate fetching the top hits

Here's an issue to split one into 2 queries because of performance: #69134

"size": 500,      <--- this is fairly large. too big to fit on the screen, right?

True, but there are only parts rendered, of course this could be optimized fetching a smaller size and using e.g. search_after to get more on demand

"version": true,  <--- do we really need this?
"stored_fields": [  <---- stored_fields should be pretty rare. I'd expect leaving this off would mostly produce all you need and adding it will fetch more than you need.
    "*"
  ],

this and more is maintained by @elastic/kibana-app-arch, dear team could you provide feedback here?

tbc.

lukasolson · 2020-06-23T18:30:02Z

I believe version is required in discover because if auto-refresh is on, and a document is updated in ES, we will only see the updated doc if we request the version (see #10385).

Regarding stored_fields, I think that has been included historically just in case there are fields that are stored in the mapping... We might be able to have an advanced option for users to specifically opt into this behavior and opt out by default.

jtibshirani · 2020-08-01T00:22:27Z

"query": {
  "bool": {   <----- This looks big but it is 100% ok. We'll rewrite it down to just the range query.
    "must": [],
    "filter": [
      {
        "match_all": {}
      },
      ...

I'm not actually sure that we rewrite this correctly. I noticed that we only remove match_all clauses if there is at least one must clause. I opened a Lucene PR with a possible fix: apache/lucene-solr#1709.

I noticed that we weren't removing the match_all while using the search profiler to debug why discover took a long time to load on a cluster.

timroes · 2021-09-15T08:30:27Z

Since (quote) "@nik9000 seemed to be happy around the Discover query the last time he looked at it", we're going to close this. Please feel free to reopen or create individual specific issues if there's more improvements left we can do.

nik9000 added Feature:Discover Discover Application Team:Visualizations Visualization editors, elastic-charts and infrastructure labels Jun 12, 2020

nik9000 assigned kertal Jun 12, 2020

lukasolson added Team:AppArch Feature:Search Querying infrastructure in Kibana labels Jun 23, 2020

lukasolson added the enhancement New value added to drive a business result label Jun 23, 2020

jtibshirani mentioned this issue Aug 24, 2020

Load date fields through 'fields' instead of 'docvalue_fields'. #75813

Closed

lukeelmers mentioned this issue Nov 3, 2020

[data.search.searchSource] Update SearchSource to use Fields API. #82383

Merged

exalate-issue-sync bot added impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort labels Jun 2, 2021

kertal mentioned this issue Jul 7, 2021

[Discover][Main] Split single query into three #102385

Closed

1 task

kertal mentioned this issue Jul 27, 2021

[Discover][Main] Split single query into 2 queries for faster results #104818

Merged

1 task

timroes added Team:DataDiscovery Discover App Team (Document Explorer, Saved Search, Surrounding documents, Graph) and removed Team:Visualizations Visualization editors, elastic-charts and infrastructure labels Aug 31, 2021

timroes closed this as completed Sep 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve discover query #69049

Improve discover query #69049

nik9000 commented Jun 12, 2020

elasticmachine commented Jun 12, 2020

nik9000 commented Jun 15, 2020

kertal commented Jun 15, 2020

lukasolson commented Jun 23, 2020

jtibshirani commented Aug 1, 2020 •

edited

Loading

timroes commented Sep 15, 2021

Improve discover query #69049

Improve discover query #69049

Comments

nik9000 commented Jun 12, 2020

elasticmachine commented Jun 12, 2020

nik9000 commented Jun 15, 2020

kertal commented Jun 15, 2020

lukasolson commented Jun 23, 2020

jtibshirani commented Aug 1, 2020 • edited Loading

timroes commented Sep 15, 2021

jtibshirani commented Aug 1, 2020 •

edited

Loading