Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support offset argument in the GROUP BY time(...) call #6504

Merged
merged 1 commit into from
May 2, 2016

Conversation

jsternberg
Copy link
Contributor

An offset of time(1m, now()) will anchor the offset to the start time
of the query. The default offset is 0s which is the current default
anyway.

This fixes #2074 by making time zone offset support unnecessary. Time
comparisons can use timezones inside of the time clause and the offset
needed for non-hour timezone differences can be used as part of the
offset argument.

@jsternberg jsternberg added this to the 0.13.0 milestone Apr 29, 2016
@jsternberg
Copy link
Contributor Author

@pauldix @jwilder

I think this does what we talked about and should make timezone support beyond the normal time parsing code unnecessary. If we want, we can possibly add support later for specifying the offset as a string in the offset column. Any formatting of the output can be handled on the client side.

@jsternberg
Copy link
Contributor Author

Some more work may need to be done though for supporting things like 1 day though. I limited the specified offset to something lower than the duration, but after reading the other issue, I'm not sure that's good enough for solving this ticket completely.

@jsternberg jsternberg force-pushed the js-2074-group-by-offsets branch 2 times, most recently from a7fdb84 to dbec384 Compare April 29, 2016 13:30
@jwilder
Copy link
Contributor

jwilder commented Apr 29, 2016

@benbjohnson

@benbjohnson
Copy link
Contributor

@jsternberg Overall it lgtm. However, why is there a limitation on the offset to be less than the time duration? You should be able to offset -7h for a time zone but you have 1h intervals.

@jsternberg
Copy link
Contributor Author

jsternberg commented Apr 29, 2016

@benbjohnson I wasn't sure about that. My reasoning was that the offset is only useful for moving the buckets and moving the bucket more than the interval doesn't make a lot of sense because of modulo math. If you want to limit the time based on a time zone, then you can just use the timezone in the time field like this:

SELECT mean(value) FROM cpu
    WHERE time >= '2010-01-01T00:00:00Z-07:00'
    AND time < '2010-01-02T00:00:00Z-07:00'
    GROUP BY time(1h)

The time when a timezone offset is useful is when you're calculating an aggregate when grouping by day and this works:

SELECT mean(value) FROM cpu
    WHERE time >= '2010-01-01T00:00:00Z-07:00'
    AND time < '2010-01-07T00:00:00Z-07:00'
    GROUP BY time(1d, -7h)

This will shift the buckets by 7 hours so that it starts at midnight in the MST time zone.

@benbjohnson
Copy link
Contributor

7h is a poor example since it would line up with a 1h bucket. However, for time zones that don't align to an hour then I could see users wanting to simply put the number of minutes for the offset and not worry about trying to mod by the bucket.

For example, Nepal has a UTC+05:45 offset. It would be nice to just put GROUP BY time(1h, 345m) for the offset.

I don't think that limiting the offset gains us anything.

@jsternberg
Copy link
Contributor Author

It's possible we could just do the mod ourselves for user-friendliness. The time positions are still going to be limited by the duration. So you could say time(1h, 345m) and it would translate the offset to 45m.

@jsternberg
Copy link
Contributor Author

I've modified the commit to perform the modulo automatically.

> select * from cpu
name: cpu
---------
time                            host            value
2016-04-29T13:35:16.204223091Z  server02        2

> select count(value) from cpu where time > now() - 4h group by time(1h, 345m)
name: cpu
---------
time                    count
2016-04-29T10:45:00Z    0
2016-04-29T11:45:00Z    0
2016-04-29T12:45:00Z    1
2016-04-29T13:45:00Z    0
2016-04-29T14:45:00Z    0

@benbjohnson
Copy link
Contributor

👍

@jsternberg jsternberg force-pushed the js-2074-group-by-offsets branch 2 times, most recently from ae2b5cb to 4226125 Compare May 2, 2016 02:33
@jwilder
Copy link
Contributor

jwilder commented May 2, 2016

This needs an InfluxQL docs change.

@pauldix can you take a look as well?

@jsternberg
Copy link
Contributor Author

There's also an ongoing discussion about the use of the word now() since the now() that can be used has no relation to the actual time, but actually relates to the start time of the query. I had a talk with @benbjohnson somewhere else about using start() instead, but there was concern that people would think start() could then be used in other parts of the query.

@benbjohnson
Copy link
Contributor

The discussion of start() can be moved to a separate issue. It doesn't seem like a blocker since now() should be supported as well if we do also use start().

@jsternberg
Copy link
Contributor Author

I disagree with that. If we support now(), I think it should refer to the current time rather than the start time. We can support both, but I don't think they should be aliases.

An offset of `time(1m, now())` will anchor the offset to the current
time of the query. The default offset is `0s` which is the current
default anyway.

This fixes #2074 by making time zone offset support unnecessary. Time
comparisons can use timezones inside of the time clause and the offset
needed for non-hour timezone differences can be used as part of the
offset argument.
@jsternberg
Copy link
Contributor Author

I updated the PR to have now() get filled in with the now time instead of the start time and we can later extend it to allow start().

@jwilder
Copy link
Contributor

jwilder commented May 2, 2016

LGTM 👍

@jsternberg jsternberg merged commit 0cc99a3 into master May 2, 2016
@jsternberg jsternberg deleted the js-2074-group-by-offsets branch May 2, 2016 19:00
@savraj
Copy link

savraj commented May 2, 2016

Does the right thing happen when my query is half in and half out of DST? I'm going to assume not -- and that should be made clear in the documentation. So if I query Friday March 11 to Monday March 14 (march 13, 2016 dst started) my UTC offset will be (in the eastern time zone) -5 and then at 2am Sunday March 13 it switches to -4.

@jsternberg
Copy link
Contributor Author

This doesn't really assess timezones specifically. All outputs are still in UTC and it's up to the client to determine how to output the value. This is mostly for offsetting the buckets to facilitate buckets that don't fall neatly in non-UTC timezones such as India which is at UTC+05:30. You can do GROUP BY time(1h, 30m) to get the appropriate bucketing of intervals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support per-query timezone offsets
4 participants