Fix for derivative() CQ's with empty buckets #3383

kruckenb · 2015-07-18T13:46:26Z

As mentioned beginning at #3247 (comment)
Setting recompute intervals using recompute-previous-n results in empty intervals when the interval doesn't overlap every sampling rate window.
The GROUP BY time() in the CQ already buckets on the correct intervals, so recompute-previous-n is unnecessary.
This is a fix against 0.9.2 branch, would be awesome if this could ship as part of 0.9.2

kruckenb · 2015-07-18T14:08:12Z

It looks like the tests are failing due to networking issues?

…ng with GROUP BY clause

otoolep · 2015-07-18T17:07:15Z

@kruckenb - the builds are failing due to compilation errors. Please run go test ./... on your tree before pushing up code, that way you can check yourself if there are build breaks. Check CONTRIBUTING.md for more details

[collectd] 2015/07/18 14:36:08 collectd UDP closed
--- PASS: TestService_BatchSize (0.01s)
    service_test.go:73: closing service
    service_test.go:73: closing service
    service_test.go:73: closing service
[collectd] 2015/07/18 14:36:08 collectd ReadFromUDP error: read udp 127.0.0.1:56594: use of closed network connection
[collectd] 2015/07/18 14:36:08 collectd UDP closed
--- PASS: TestService_BatchDuration (0.25s)
    service_test.go:138: closing service
PASS
ok      github.com/influxdb/influxdb/services/collectd  0.264s
?       github.com/influxdb/influxdb/services/collectd/test_client  [no test files]
# github.com/influxdb/influxdb/services/continuous_querier
services/continuous_querier/service_test.go:97: s.Config.RecomputePreviousN undefined (type *Config has no field or method RecomputePreviousN)
FAIL    github.com/influxdb/influxdb/services/continuous_querier [build failed]

kruckenb · 2015-07-20T03:29:14Z

@otoolep passes tests now.

pauldix · 2015-07-23T20:19:01Z

Not sure if this is right. The point of recompute-previous-n was to have the CQ service recompute intervals that have already passed. This is to deal with the problem of delayed data coming in.

kruckenb · 2015-07-23T20:36:44Z

@pauldix This patch doesn't remove computing previous intervals. Previous intervals are still computed back to recompute-no-older-than, and the CQ itself handles the bucketing with it's GROUP BY time() clause.

The problem with recompute-previous-n is that it effectively adds another layer of windowing, and results in some intervals having no results (and being filled in with 0's). For example, if two samples happen to fall into the same recompute-previous-n window, the CQ will only generate a result for the second sample, and the result for the first sample timestamp will be 0. This happens periodically, depending on how the windows for samples and running CQ's line up.

This fixes another issue that is also at the root of #3247: if you run a CQ with a GROUP BY time(30s) against a series with 30-second samples, the result will mostly/always be zero, because every interval (limited by the recompute-previous-n window) will have a single sample and derivative() requires at least 2 samples. If you change the CQ GROUP BY to a longer time-span then you'll start to get some 0 results because of the windowing problem mentioned above. Eliminating the recompute-previous-n with this patch fixes this problem as long as recompute-no-older-than is longer than 2x the sampling period.

I'm not sure I'm explaining it clearly (there's a lot to this), if it would help I can post some before-and-after details.

pauldix · 2015-07-23T20:43:18Z

The original goal with the combination of recompute-no-older-than and recompute-previous-n was to give the user controls over how much gets computed when they have CQs with different group by times.

For instance, if you have a CQ with a group by time of 1h, you'd only want to recompute the interval before it. However, if you have a CQ with a group by time of 30s, you may want to compute the last 10 intervals just to make sure that you pull back lagged data.

Having only a single time interval means that you could end up not recomputing those 1h buckets. Unless every CQ query always forces at least the current period and the previous period to be computed, which would solve that problem.

They could set the recompute-no-older-than to something like 10m which would be their window of lagged data, but CQs with longer group by times would still be ok because we always recompute the previous interval.

That make sense?

kruckenb · 2015-07-23T20:54:22Z

@pauldix I think I know how to resolve this so it uses recompute-no-older-than and recompute-previous-n to calculate the full recompute window, but eliminates the intermediate time windows (the key to the fix).

Use pre-patch computation to calculate full recompute window based on recompute-no-older-than and recompute-previous-n values
Run the CQ query once using full time window from Step 1 and rely on the CQ GROUP BY time() clause to compute values for intermediate intervals. This approach will provide enough samples to derivative() to avoid 0-value intervals.

If that approach sounds right I'll code it up.

pauldix · 2015-07-23T20:57:23Z

@kruckenb yeah, I think that sounds right. Would be nice to get some unit tests around #1 to ensure that the number of intervals/length of CQ time is sane given different settings and group by intervals.

This reverts commit f6b360c.

… bucketing with GROUP BY clause" This reverts commit 86f1e47.

kruckenb · 2015-07-27T14:32:40Z

Refactored to support both recompute-previous-n and recompute-no-older-than

pauldix · 2015-07-27T19:26:08Z

I think there might be one small issue on the setting of the time if recompute-no-older-than is hit. For instance, say they have recompute-previous-n of 3 and recompute-no-older-than set to 20m. Then say they have a CQ where the group by time is 1h.

In that case the recompute-previous-n would go back 3 hours. Your logic then sets the start time to be 20m previous to now. Which would end up writing bad results for that previous window in some cases.

What we actually want to happen is that in all cases, the start time is rounded down to the previous interval. So in the case of my example, we'd actually always compute the last hour and the current hour.

In the case of a 1m group by time, we'd always compute the previous 3 minutes and the current 3 minutes.

sebito91 · 2015-09-23T16:37:33Z

+1 on this commit!

We're looking to graph rate metrics like packets/sec for network traffic, and using the deriv is one key way to implement this. Having gone back and forth trying to get consistent data in our graphs (using grafana), it's somewhat confusing as to the best approach.

Overall question, does it make more sense to apply a query as below in a CQ or just have it run on its own? Seems like a CQ may lead to more consistent data with the ability to tweak the re-calc knobs, but is that best practice for such a (seemingly common) metric?

SELECT derivative(last(value)) FROM "net_packets_recv" WHERE "host" = 'hostname' AND "interface" = 'bond0' and time > now() - 1h group by time(10s)

I'm curious to try out the patch from @kruckenb for a CQ with the above query to ensure we don't receive 0 values. At this point, whether from GUI or CLI we get periodic 0-values returned if we happen to query between intervals. If I'm in a tight loop accessing the same set of data (say a read query from 100 hosts at the same time), some will get all 0's but some will get full data...

> select derivative(last(value)) from net_packets_recv where host='hostname' and interface='enp129s0f0' and time > now() - 1h group by time(10s)
name: net_packets_recv
----------------------
time            derivative
2015-09-23T15:31:10Z    0

<wait about one/two seconds>

> select derivative(last(value)) from net_packets_recv where host='hostname' and interface='enp129s0f0' and time > now() - 1h group by time(10s)
name: net_packets_recv
----------------------
time            derivative
2015-09-23T15:31:30Z    216.8
2015-09-23T15:31:40Z    216.8
2015-09-23T15:31:50Z    217.9
2015-09-23T15:32:00Z    230.5
...
...

sebito91 · 2015-09-25T18:54:11Z

Following up yet again, found a post with a tip buried deep within...though it does present its own problems. For derivative functions (likely any other rate calcs), you should append fill(0) to your query:

https://groups.google.com/forum/#!topic/influxdb/YbbWL2KCyVk

This does result in some wonky first-bucket calculation periodically but overall does what you need.

jsternberg · 2016-03-16T18:03:22Z

Any fix for derivative should also fix derivatives when querying normally. We may revisit this kind of solution later, but I think we're going to try implementing the suggestion in #5943.

Thanks for your contribution. I'm going to close this PR.

kruckenb changed the title ~~compute continuous queries back to 'recompute-no-older-than'~~ compute continuous queries back to 'recompute-no-older-than', let 'GROUP BY' handle bucketing Jul 18, 2015

kruckenb changed the title ~~compute continuous queries back to 'recompute-no-older-than', let 'GROUP BY' handle bucketing~~ Fix for derivative() CQ's with empty buckets Jul 18, 2015

kruckenb mentioned this pull request Jul 18, 2015

Continuous Query with derivative fails silently #3247

Closed

compute continuous queries back to 'recompute-no-older-than', bucketi…

86f1e47

…ng with GROUP BY clause

kruckenb force-pushed the fix-3247 branch from 9cefc1c to 86f1e47 Compare July 18, 2015 14:35

remove references to RecomputePreviousN

f6b360c

kruckenb added 3 commits July 23, 2015 23:18

Revert "remove references to RecomputePreviousN"

d4aa715

This reverts commit f6b360c.

Revert "compute continuous queries back to 'recompute-no-older-than',…

cf05570

… bucketing with GROUP BY clause" This reverts commit 86f1e47.

execute continuous query using single time window

bb1d667

sebito91 mentioned this pull request Sep 23, 2015

InfluxDB: New Editor to support nested functions and function parameters grafana/grafana#2802

Closed

jsternberg closed this Mar 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for derivative() CQ's with empty buckets #3383

Fix for derivative() CQ's with empty buckets #3383

kruckenb commented Jul 18, 2015

kruckenb commented Jul 18, 2015

otoolep commented Jul 18, 2015

kruckenb commented Jul 20, 2015

pauldix commented Jul 23, 2015

kruckenb commented Jul 23, 2015

pauldix commented Jul 23, 2015

kruckenb commented Jul 23, 2015

pauldix commented Jul 23, 2015

kruckenb commented Jul 27, 2015

pauldix commented Jul 27, 2015

sebito91 commented Sep 23, 2015

sebito91 commented Sep 25, 2015

jsternberg commented Mar 16, 2016

Fix for derivative() CQ's with empty buckets #3383

Fix for derivative() CQ's with empty buckets #3383

Conversation

kruckenb commented Jul 18, 2015

kruckenb commented Jul 18, 2015

otoolep commented Jul 18, 2015

kruckenb commented Jul 20, 2015

pauldix commented Jul 23, 2015

kruckenb commented Jul 23, 2015

pauldix commented Jul 23, 2015

kruckenb commented Jul 23, 2015

pauldix commented Jul 23, 2015

kruckenb commented Jul 27, 2015

pauldix commented Jul 27, 2015

sebito91 commented Sep 23, 2015

sebito91 commented Sep 25, 2015

jsternberg commented Mar 16, 2016