-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
date_histogram issue when using "pre_zone_adjust_large_interval" and a timezone with DST #9491
Comments
I had the time to look into this issue and figure out the exact reasons for this to happen and how it might be fixed. The issue is with the The problem is to determine the offset to subtract from the rounded key in order to take it back to the correct UTC time. The desired offset might be different from the original added offset due to DST switch (either the original shift or the rounding resulted in a timestamp that occurs in different DST configuration). Prior to fix #8655, the offset used to shift back was the offset at the rounded key. This worked well in most cases for large intervals, but not for hour intervals around the DST switch. But as we saw there is a problem with large intervals now - if the timezone offset at the original key is different than what it is at rounded key (which points to midnight), then we'll end up with a rounded key one hour away from midnight. For example I figured out that in order to solve this, we can use getOffsetFromLocal method of the timezone class. This method is the opposite of I changed the code to use this method and all the tests pass, including tests I added for that issue (that fail on the latest code). But then, I found another rare case that fails when using After looking at the work done on I will post a pull request with my fix and my additional tests as soon as I figure out if anything should also be changed in the strategies that deal with intervals such as "2d" (I think this problem is irrelevant there). Here are the tests I added in order to cover @Test
public void testAdjustPreTimeZone() {
Rounding tzRounding;
// Day interval
tzRounding = TimeZoneRounding.builder(DateTimeUnit.DAY_OF_MONTH).preZone(DateTimeZone.forID("Asia/Jerusalem")).preZoneAdjustLargeInterval(true).build();
assertThat(tzRounding.round(time("2014-11-11T17:00:00", DateTimeZone.forID("Asia/Jerusalem"))), equalTo(time("2014-11-11T00:00:00", DateTimeZone.forID("Asia/Jerusalem"))));
// DST on
assertThat(tzRounding.round(time("2014-08-11T17:00:00", DateTimeZone.forID("Asia/Jerusalem"))), equalTo(time("2014-08-11T00:00:00", DateTimeZone.forID("Asia/Jerusalem"))));
// Day of switching DST on -> off
assertThat(tzRounding.round(time("2014-10-26T17:00:00", DateTimeZone.forID("Asia/Jerusalem"))), equalTo(time("2014-10-26T00:00:00", DateTimeZone.forID("Asia/Jerusalem"))));
// Day of switching DST off -> on
assertThat(tzRounding.round(time("2015-03-27T17:00:00", DateTimeZone.forID("Asia/Jerusalem"))), equalTo(time("2015-03-27T00:00:00", DateTimeZone.forID("Asia/Jerusalem"))));
// Month interval
tzRounding = TimeZoneRounding.builder(DateTimeUnit.MONTH_OF_YEAR).preZone(DateTimeZone.forID("Asia/Jerusalem")).preZoneAdjustLargeInterval(true).build();
assertThat(tzRounding.round(time("2014-11-11T17:00:00", DateTimeZone.forID("Asia/Jerusalem"))), equalTo(time("2014-11-01T00:00:00", DateTimeZone.forID("Asia/Jerusalem"))));
// DST on
assertThat(tzRounding.round(time("2014-10-10T17:00:00", DateTimeZone.forID("Asia/Jerusalem"))), equalTo(time("2014-10-01T00:00:00", DateTimeZone.forID("Asia/Jerusalem"))));
// Year interval
tzRounding = TimeZoneRounding.builder(DateTimeUnit.YEAR_OF_CENTURY).preZone(DateTimeZone.forID("Asia/Jerusalem")).preZoneAdjustLargeInterval(true).build();
assertThat(tzRounding.round(time("2014-11-11T17:00:00", DateTimeZone.forID("Asia/Jerusalem"))), equalTo(time("2014-01-01T00:00:00", DateTimeZone.forID("Asia/Jerusalem"))));
// Two timestamps in same year ("Double buckets" bug in 1.3.7)
tzRounding = TimeZoneRounding.builder(DateTimeUnit.YEAR_OF_CENTURY).preZone(DateTimeZone.forID("Asia/Jerusalem")).preZoneAdjustLargeInterval(true).build();
assertThat(tzRounding.round(time("2014-11-11T17:00:00", DateTimeZone.forID("Asia/Jerusalem"))),
equalTo(tzRounding.round(time("2014-08-11T17:00:00", DateTimeZone.forID("Asia/Jerusalem")))));
} Here is a test that covers the ambiguous hours bug: @Test
public void testAmbiguousHoursAfterDSTSwitch() {
Rounding tzRounding;
tzRounding = TimeZoneRounding.builder(DateTimeUnit.HOUR_OF_DAY).preZone(DateTimeZone.forID("Asia/Jerusalem")).preZoneAdjustLargeInterval(true).build();
assertThat(tzRounding.round(time("2014-10-25T22:30:00", DateTimeZone.UTC)), equalTo(time("2014-10-25T22:00:00", DateTimeZone.UTC)));
assertThat(tzRounding.round(time("2014-10-25T23:30:00", DateTimeZone.UTC)), equalTo(time("2014-10-25T23:00:00", DateTimeZone.UTC)));
} |
I can confirm I've seen this issue before - back in the facet days with 0.90. I recall opening an issue on this but can't find it now. Good catch! |
Thanks for the great test case. Since I'm currently trying to clean up the time zone management for 2.0 I included your test case from the first comment. I can reproduce the issue on 1.4 and current 2.0 branch and I was able to make it pass using the following changes to TimeZoneRounding.TimeTimeZoneRoundingFloor:
This basically follows your suggestions and the work in #9637. I will also add your other Rounding tests there and see if they pass with my intended implementation. However, the |
Yes, the code for roundKey that you included looks exactly like my final solution, that's great. About removing |
Yes, I think that is what it comes down to. I was able to use your test cases above with minor modifications to test on 2.0 branch. I think the plan is to backport the parts that are bug fixes to 1.x branch, but you can also propose a PR for that. This will still have to work with pre/postZone etc... because those will only be cleaned up later. |
Alright, thanks a lot! I will do so in the next days. |
This fix enhances the internal time zone conversion in the TimeZoneRounding classes that were the cause of issues with strange date bucket keys in elastic#9491 and elastic#7673. Closes elastic#9491 Closes elastic#7673
Don't you prefer to have the tests in the same PR with your fix? If you do not, I will open a PR later today. |
No, this is great, will merge that after the fix is on the branch. Many thanks. |
This fix enhances the internal time zone conversion in the TimeZoneRounding classes that were the cause of issues with strange date bucket keys in elastic#9491 and elastic#7673. Closes elastic#9491 Closes elastic#7673
Hey there.
Since we upgraded to version 1.3.7, we noticed that we sometimes get weird double buckets when running our date_histogram aggregations.
After trying some things we figured out it only happens when we set
pre_zone_adjust_large_interval
to true,pre_zone
to our local time zone (which uses DST), andinterval
is 'day' or bigger.In the aggregration's result we see that we get two buckets corresponding to the same interval, but with an hour difference between their keys.
Here is a simple example that should reproduce this issue:
And the result:
As you can see, although the two timestamps unarguably belong to the same year, they're put into different buckets. As we figured out, it happens because the timezone offset is different for the two timestamps as one of them occurs when DST is on and the other is not.
We use
Asia/Jerusalem
timezone but the same problem happens in every timezone with DST (eg CET), as long aspre_zone_adjust_large_interval
is used (When it's not used both documents will go to the same bucket"2014-01-01T00:00:00.000Z"
).This problem never happened before we upgraded to 1.3.7. Moreover, while digging into the code we figured out that this bug is a direct result of the proposed fix for issue #8339. While that fix indeed solved the timezone problem in 'hour' intervals around DST switch, it caused the new problem with bigger intervals.
The problem is also in version 1.4.2.
The text was updated successfully, but these errors were encountered: