-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Align downsampling intervals to the Gregorian calendar. #657
Conversation
Note that, for performance reasons, it might be desirable to use Joda DateTime |
781c4f3
to
75fe2f7
Compare
This feature supports the alignment of downsampling intervals to the Gregorian calendar based on four different time categories: - DAILY: The start time of each interval is computed as the start of the day in which the first data point occurs, based on a specified time zone (or the default JVM time zone, if no time zone has been specified). The end time of each interval is computed as the end of the day in which the first data point occurs. For instance, if the specified time zone is UTC, and the timestamp of the first data point is 2016-01-05T05:32:00Z, then start of the interval will be computed as 2016-01-05T00:00:00.000Z, while the end of the interval will be computed as 2016-01-05T23:59:59.999Z. - WEEKLY: The start time of each interval is computed as the start of the week in which the first data point occurs, based on a specified time zone (or the default JVM time zone, if no time zone has been specified). The end time of each interval is computed as the end of the week in which the first data point occurs. Weeks are considered to begin on Sundays (in the future, it might be a good idea to allow for variations based on a configuration setting). For instance, if the specified time zone is UTC, and the timestamp of the first data point is 2016-01-05T05:32:00Z, then start of the interval will be computed as 2016-01-03T00:00:00.000Z, while the end of the interval will be computed as 2016-01-09T23:59:59.999Z. - MONTHLY: The start time of each interval is computed as the start of the month in which the first data point occurs, based on a specified time zone (or the default JVM time zone, if no time zone has been specified). The end time of each interval is computed as the end of the month in which the first data point occurs. For instance, if the specified time zone is UTC, and the timestamp of the first data point is 2016-01-05T05:32:00Z, then start of the interval will be computed as 2016-01-01T00:00:00.000Z, while the end of the interval will be computed as 2016-01-31T23:59:59.999Z. - YEARLY: The start time of each interval is computed as the start of the year in which the first data point occurs, based on a specified time zone (or the default JVM time zone, if no time zone has been specified). The end time of each interval is computed as the end of the year in which the first data point occurs. For instance, if the specified time zone is UTC, and the timestamp of the first data point is 2016-01-05T05:32:00Z, then start of the interval will be computed as 2016-01-01T00:00:00.000Z, while the end of the interval will be computed as 2016-12-31T23:59:59.999Z. This feature also allows for the alignment of intervals that are multiples of one year, one month, one week, or one day. In cases where a given interval is a multiple of more than one time category, the larger time category will be used. For instance, an interval of 24 months will be interpreted as a interval of two years, and will be aligned to the calendar accordingly. As such, if the specified time zone is UTC, and the timestamp of the first data point is 2016-03-05T05:32:00Z, then start of the interval will be computed as 2016-01-01T00:00:00.000Z, while the end of the interval will be computed as 2017-12-31T23:59:59.999Z. This is in keeping with the principle of least astonishment. To specify the time zone for a given HTTP query, include a query string parameter named "tz" with a value equal to a JVM time zone id (e.g. "UTC"). If a time zone is not included in the query string, the default JVM time zone will be used. To specify that a given HTTP query should use the calendar alignment feature for downsampling, include a query string parameter named "use_calendar" with a value of "true". You can stipulate that all HTTP queries should use the calendar alignment feature by including a "tsd.query.downsample.use_calendar" configuration setting within the opentsdb.conf file and by setting its value to "true" (the default value is "false"). This config file setting can be overridden on a per-query basis by including the "use_calendar" parameter in the query string as specified above.
75fe2f7
to
749a54d
Compare
@manolama Could we get this merged into the 2.3.0 branch (currently put)? I'm pulling it into our internal build and will test it there. |
@johann8384 Out of curiosity, how did your testing go? We have been using this patch in a production system for a couple of months now, and have encountered no problems so far. |
I didn't find any issues with it. |
Finally taking a crack at this for v2.3. I'm rebasing it to work with the added "all" downsampling and there are some test cases I need to run it through to make sure it's good. Thanks! |
Ok, it looked pretty good but there were a few tweaks I needed to make such as handling hourly downsampling and aligning to useful boundaries if someone gives an odd interval. Also cleaned it up a bit so it's a little faster, advancing calendars instead of creating a new one. I'll post that after I fix up the UTs. |
…lendar query parameters. This will enable aligning downsampling intervals to gregorian calendar to use timezone entry
This feature builds on the skeleton provided by @moarcaccio in
Pull Request #548, adding in all of the functionality requested by
@manolama in his Pull Request comments, and resolving a number of
defects which rendered the Pull Request unusable. After waiting
several months for @moarcaccio to complete the proposed
feature, we decided to move forward with our own Pull Request.
This feature supports the alignment of downsampling intervals to the
Gregorian calendar based on four different time categories:
day in which the first data point occurs, based on a specified time zone
(or the default JVM time zone, if no time zone has been specified).
The end time of each interval is computed as the end of the day in which
the first data point occurs. For instance, if the specified time zone
is UTC, and the timestamp of the first data point is 2016-01-05T05:32:00Z,
then start of the interval will be computed as 2016-01-05T00:00:00.000Z,
while the end of the interval will be computed as 2016-01-05T23:59:59.999Z.
week in which the first data point occurs, based on a specified time zone
(or the default JVM time zone, if no time zone has been specified).
The end time of each interval is computed as the end of the week in which
the first data point occurs. Weeks are considered to begin on Sundays (in
the future, it might be a good idea to allow for variations based on a
configuration setting). For instance, if the specified time zone is UTC,
and the timestamp of the first data point is 2016-01-05T05:32:00Z, then
start of the interval will be computed as 2016-01-03T00:00:00.000Z,
while the end of the interval will be computed as 2016-01-09T23:59:59.999Z.
month in which the first data point occurs, based on a specified time zone
(or the default JVM time zone, if no time zone has been specified).
The end time of each interval is computed as the end of the month in which
the first data point occurs. For instance, if the specified time zone
is UTC, and the timestamp of the first data point is 2016-01-05T05:32:00Z,
then start of the interval will be computed as 2016-01-01T00:00:00.000Z,
while the end of the interval will be computed as 2016-01-31T23:59:59.999Z.
year in which the first data point occurs, based on a specified time zone
(or the default JVM time zone, if no time zone has been specified).
The end time of each interval is computed as the end of the year in which
the first data point occurs. For instance, if the specified time zone
is UTC, and the timestamp of the first data point is 2016-01-05T05:32:00Z,
then start of the interval will be computed as 2016-01-01T00:00:00.000Z,
while the end of the interval will be computed as 2016-12-31T23:59:59.999Z.
This feature also allows for the alignment of intervals that are multiples
of one year, one month, one week, or one day. In cases where a given
interval is a multiple of more than one time category, the larger time
category will be used. For instance, an interval of 24 months will be
interpreted as an interval of two years, and will be aligned to the calendar
accordingly. As such, if the specified time zone is UTC,
and the timestamp of the first data point is 2016-03-05T05:32:00Z, then
the start of the interval will be computed as 2016-01-01T00:00:00.000Z,
while the end of the interval will be computed as 2017-12-31T23:59:59.999Z.
This is in keeping with the principle of least astonishment.
To specify the time zone for a given HTTP query, include a query string
parameter named "tz" with a value equal to a JVM time zone id (e.g. "UTC").
If a time zone is not included in the query string, the default JVM time zone will
be used.
To specify that a given HTTP query should use the calendar alignment feature
for downsampling, include a query string parameter named "use_calendar" with
a value of "true". You can stipulate that all HTTP queries should use the
calendar alignment feature by including a "tsd.query.downsample.use_calendar"
configuration setting within the opentsdb.conf file and by setting its value
to "true" (the default value is "false"). The value of this config file setting can be
overridden on a per-query basis by including the "use_calendar" parameter in
the query string as specified above.