auto_date_histogram generates too many buckets for smaller time range #43577

timroes · 2019-06-25T11:41:37Z

It seems, that the auto_date_histogram seems to generate sometimes too many buckets, especially when using an overall small time-range to filter on. It's not constantly happening (I assume it is related a bit to the exact time you send it), but it happened reliable enough to reproduce it always within max. 1 minute trying.

I just used the following request (after injecting the Kibana Flight sample data set):

GET kibana_sample_data_flights/_search
{
  "size": 0, 
  "query": {
    "range": {
      "timestamp": {
        "gte": "now-20m",
        "lte": "now"
      }
    }
  },
  "aggs": {
    "foo": {
      "auto_date_histogram": {
        "field": "timestamp",
        "buckets": 1
      }
    }
  }
}

This request returned the following response:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "foo" : {
      "buckets" : [
        {
          "key_as_string" : "2019-06-25T11:14:00.000Z",
          "key" : 1561461240000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2019-06-25T11:24:00.000Z",
          "key" : 1561461840000,
          "doc_count" : 3
        }
      ],
      "interval" : "10m"
    }
  }
}

As you can see even though we specified, we need 1 bucket, it generated 2 buckets for us. When playing around with the overall filter it seems that behavior never triggered for me once I had a timerange larger than 30 minutes, but was able to reproduce it rather often with time ranges below 30 minutes (wasn't linked to exactly 20 minutes time range). I tested this mainly on master. I was also not able to reproduce that behavior if I didn't use now as the upper bound but tried something like now-1440m to now-1420m.

cc @polyfractal

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-06-25T11:41:38Z

Pinging @elastic/es-analytics-geo

polyfractal · 2019-06-25T16:15:31Z

/cc @pcsanwald this looks somewhat similar to the failure in #39497, perhaps related?

polyfractal · 2019-06-25T16:42:30Z

Had a quick look, and found the problem area. Or at least, the point where multiple buckets are being generated, don't know enough about auto histo to know where the actual cause is.

When you execute the above query (at a time that is a non-multiple of 30 minute), multiple buckets are generated when the coordinator is merging consecutive buckets.

At InternalAutoDateHistogram#mergeConsecutiveBuckets(), we merge buckets when i % mergeInterval == 0. Under this condition, we have ~18 reduced buckets total, 10 have accumulated into sameKeyedBuckets and 8 are left to process. The first 10 merge into a bucket (because the mergeInterval is 10), and the leftover 8 get merged into a separate bucket after the loop exits.

Depending on when the agg executes, sometimes it switches to 5min intervals so the values involved change slightly, but the behavior is basically the same.

By pure accident, I executed at a half-hour mark (12:30) and got a single bucket. Once the half-hour mark passed (12:31) it went back to two buckets.

I don't know enough about the merging logic to know what's going on, but I think this is probably the area that's broken.

pcsanwald · 2019-06-25T17:22:26Z

This does seem quite similar to the failure case and would explain why it's been so tricky to reproduce (I'm currently doing a very long run).

timroes added >bug :Analytics/Aggregations Aggregations labels Jun 25, 2019

gospodarsky mentioned this issue Jun 25, 2019

[TSVB] Introducing Timerange Data Mode for Metric Style Visualizations elastic/kibana#37185

Merged

pcsanwald self-assigned this Jul 5, 2019

pcsanwald mentioned this issue Jul 16, 2019

Fix incorrect calculation of how many buckets will result from a merge #44461

Merged

pcsanwald closed this as completed in #44461 Jul 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

auto_date_histogram generates too many buckets for smaller time range #43577

auto_date_histogram generates too many buckets for smaller time range #43577

timroes commented Jun 25, 2019

elasticmachine commented Jun 25, 2019

polyfractal commented Jun 25, 2019

polyfractal commented Jun 25, 2019

pcsanwald commented Jun 25, 2019

auto_date_histogram generates too many buckets for smaller time range #43577

auto_date_histogram generates too many buckets for smaller time range #43577

Comments

timroes commented Jun 25, 2019

elasticmachine commented Jun 25, 2019

polyfractal commented Jun 25, 2019

polyfractal commented Jun 25, 2019

pcsanwald commented Jun 25, 2019