Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto_date_histogram generates too many buckets for smaller time range #43577

Closed
timroes opened this issue Jun 25, 2019 · 4 comments · Fixed by #44461
Closed

auto_date_histogram generates too many buckets for smaller time range #43577

timroes opened this issue Jun 25, 2019 · 4 comments · Fixed by #44461
Assignees

Comments

@timroes
Copy link
Contributor

timroes commented Jun 25, 2019

It seems, that the auto_date_histogram seems to generate sometimes too many buckets, especially when using an overall small time-range to filter on. It's not constantly happening (I assume it is related a bit to the exact time you send it), but it happened reliable enough to reproduce it always within max. 1 minute trying.

I just used the following request (after injecting the Kibana Flight sample data set):

GET kibana_sample_data_flights/_search
{
  "size": 0, 
  "query": {
    "range": {
      "timestamp": {
        "gte": "now-20m",
        "lte": "now"
      }
    }
  },
  "aggs": {
    "foo": {
      "auto_date_histogram": {
        "field": "timestamp",
        "buckets": 1
      }
    }
  }
}

This request returned the following response:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "foo" : {
      "buckets" : [
        {
          "key_as_string" : "2019-06-25T11:14:00.000Z",
          "key" : 1561461240000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2019-06-25T11:24:00.000Z",
          "key" : 1561461840000,
          "doc_count" : 3
        }
      ],
      "interval" : "10m"
    }
  }
}

As you can see even though we specified, we need 1 bucket, it generated 2 buckets for us. When playing around with the overall filter it seems that behavior never triggered for me once I had a timerange larger than 30 minutes, but was able to reproduce it rather often with time ranges below 30 minutes (wasn't linked to exactly 20 minutes time range). I tested this mainly on master. I was also not able to reproduce that behavior if I didn't use now as the upper bound but tried something like now-1440m to now-1420m.

cc @polyfractal

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo

@polyfractal
Copy link
Contributor

/cc @pcsanwald this looks somewhat similar to the failure in #39497, perhaps related?

@polyfractal
Copy link
Contributor

Had a quick look, and found the problem area. Or at least, the point where multiple buckets are being generated, don't know enough about auto histo to know where the actual cause is.

When you execute the above query (at a time that is a non-multiple of 30 minute), multiple buckets are generated when the coordinator is merging consecutive buckets.

At InternalAutoDateHistogram#mergeConsecutiveBuckets(), we merge buckets when i % mergeInterval == 0. Under this condition, we have ~18 reduced buckets total, 10 have accumulated into sameKeyedBuckets and 8 are left to process. The first 10 merge into a bucket (because the mergeInterval is 10), and the leftover 8 get merged into a separate bucket after the loop exits.

Depending on when the agg executes, sometimes it switches to 5min intervals so the values involved change slightly, but the behavior is basically the same.

By pure accident, I executed at a half-hour mark (12:30) and got a single bucket. Once the half-hour mark passed (12:31) it went back to two buckets.

I don't know enough about the merging logic to know what's going on, but I think this is probably the area that's broken.

@pcsanwald
Copy link
Contributor

This does seem quite similar to the failure case and would explain why it's been so tricky to reproduce (I'm currently doing a very long run).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants