Aggregation functions should provide interpolation when aggregating multiple series #728

jaredgisin · 2014-07-10T20:46:13Z

As a general purpose time series database, I was surprised to see that the aggregation functions do not perform interpolation.

http://opentsdb.net/docs/build/html/user_guide/query/aggregators.html

Given mutliple series that report the same value over time at an irregular interval (like happens in the real world), the summation of such series with a query like:

select sum(value) from /disk.capacity.*/ group by time(5s)

produces a single value. I would expect it to first down sample each series matching /disk.capacity.*/ on 5s intervals, and then if necessary, interpolate each of those series to produce a correct aggregation.

This only roughly works if the group by time interval matches the reporting interval of the original series, but even that can produce a wavy summation line out of otherwise flat time series graphs due to the aggregation function having no idea about how many value it should expect in the sum and compute appropriate values from each series to fill in the missing values in the aggregate series.

The problem seems to be further worse in that the actual output sum in the query above for identical flat-line graphs is multiplied if the group by time() value is multiplied. For example in the above, group by(10s) will produce sums that are twice than what is expected.

Is this something that is on the roadmap to address?

pauldix · 2014-07-10T21:24:23Z

That's because aggregators in OpenTSDB are aggregators over multiple series that get combined into one. In Influx an aggregator is more traditional, that is it aggregates over multiple points in a single series.

In the case of your query, each series gets aggregated individually and you get returned as many series as match that regex. If you want empty intervals filled, you can use fill(0) as an option. We'll be adding other options to fill later like mean, previous, and next.

What you're probably looking to do is to merge many time series into one and then aggregate over that. Issue #72 will give you the ability to do that. But that still won't help you if you have intervals with missing data. In that case you'd probably want something more like a join where you can specify what value gets assigned for any series that is missing an interval.

Influx isn't just limited to storing data at fixed intervals of time. As such, it has functions designed to calculate intervals over multiple points.

I'm closing this out for now, but feel free to keep commenting and I will open issues that map to feature requests that come out of the discussion. In the future, the proper place to initiate a discussion around feature development or general questions is the mailing list.

Thanks

nazgu1 · 2015-01-03T13:01:58Z

We'll be adding other options to fill later like mean, previous, and next.

When will you plan to add this options to fill() ?

pauldix · 2015-01-04T16:39:23Z

Either as part of the 0.9.0 release or in a point release shortly afterwards

pauldix closed this as completed Jul 10, 2014

zzzuzik mentioned this issue Dec 23, 2014

sum(value) is multiplied by "group by time()" #1262

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregation functions should provide interpolation when aggregating multiple series #728

Aggregation functions should provide interpolation when aggregating multiple series #728

jaredgisin commented Jul 10, 2014

pauldix commented Jul 10, 2014

nazgu1 commented Jan 3, 2015

pauldix commented Jan 4, 2015

Aggregation functions should provide interpolation when aggregating multiple series #728

Aggregation functions should provide interpolation when aggregating multiple series #728

Comments

jaredgisin commented Jul 10, 2014

pauldix commented Jul 10, 2014

nazgu1 commented Jan 3, 2015

pauldix commented Jan 4, 2015