Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregation functions should provide interpolation when aggregating multiple series #728

Closed
jaredgisin opened this issue Jul 10, 2014 · 3 comments

Comments

@jaredgisin
Copy link

As a general purpose time series database, I was surprised to see that the aggregation functions do not perform interpolation.

http://opentsdb.net/docs/build/html/user_guide/query/aggregators.html

Given mutliple series that report the same value over time at an irregular interval (like happens in the real world), the summation of such series with a query like:

select sum(value) from /disk.capacity.*/ group by time(5s)

produces a single value. I would expect it to first down sample each series matching /disk.capacity.*/ on 5s intervals, and then if necessary, interpolate each of those series to produce a correct aggregation.

This only roughly works if the group by time interval matches the reporting interval of the original series, but even that can produce a wavy summation line out of otherwise flat time series graphs due to the aggregation function having no idea about how many value it should expect in the sum and compute appropriate values from each series to fill in the missing values in the aggregate series.

The problem seems to be further worse in that the actual output sum in the query above for identical flat-line graphs is multiplied if the group by time() value is multiplied. For example in the above, group by(10s) will produce sums that are twice than what is expected.

Is this something that is on the roadmap to address?

@pauldix
Copy link
Member

pauldix commented Jul 10, 2014

That's because aggregators in OpenTSDB are aggregators over multiple series that get combined into one. In Influx an aggregator is more traditional, that is it aggregates over multiple points in a single series.

In the case of your query, each series gets aggregated individually and you get returned as many series as match that regex. If you want empty intervals filled, you can use fill(0) as an option. We'll be adding other options to fill later like mean, previous, and next.

What you're probably looking to do is to merge many time series into one and then aggregate over that. Issue #72 will give you the ability to do that. But that still won't help you if you have intervals with missing data. In that case you'd probably want something more like a join where you can specify what value gets assigned for any series that is missing an interval.

Influx isn't just limited to storing data at fixed intervals of time. As such, it has functions designed to calculate intervals over multiple points.

I'm closing this out for now, but feel free to keep commenting and I will open issues that map to feature requests that come out of the discussion. In the future, the proper place to initiate a discussion around feature development or general questions is the mailing list.

Thanks

@nazgu1
Copy link

nazgu1 commented Jan 3, 2015

We'll be adding other options to fill later like mean, previous, and next.

When will you plan to add this options to fill() ?

@pauldix
Copy link
Member

pauldix commented Jan 4, 2015

Either as part of the 0.9.0 release or in a point release shortly afterwards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants