Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GROUP BY time(x) includes partial intervals #8244

Open
jomo opened this issue Apr 4, 2017 · 14 comments
Open

GROUP BY time(x) includes partial intervals #8244

jomo opened this issue Apr 4, 2017 · 14 comments
Labels

Comments

@jomo
Copy link

jomo commented Apr 4, 2017

Bug report

System info: InfluxDB version 1.2.1

Steps to reproduce:

  1. Insert data with an interval of 1m, but start 30s after a full minute
    Example: Insert data at 00:00:30, 00:01:30, 00:02:30, …
  2. SELECT mean(value) FROM data WHERE time <= now() GROUP BY time(1m)

Expected behavior:
I would expect InfluxDB to only group time ranges that fully fit the into the queried time range (in this case, time ranges that have already passed)

Example:

  • The last time covered by the query (now()) is 00:01:01
  • The last query result covers the range 00:00:00 - 00:01:00
  • (I'm unsure if the time field of the result should be 00:00:00 or 00:01:00)

Actual behavior:
InfluxDB groups time ranges extending the queried time, in this example meaning as soon as 00:01:00 passes, it will group 00:01:00 - 00:02:00. With data only being inserted at 00:01:30, the last result will be null (until 00:01:30 passes), although data is inserted with the same interval as used in the query.

Additional info:
Related issues: #3926 #4282 #4038 #8010

@jsternberg
Copy link
Contributor

Whenever you have a query with a time range that includes a partial interval, it will still include the partial interval, but won't include data that doesn't fit inside of the time constraints.

So in your example, if you query SELECT mean(value) FROM cpu WHERE time >= '2000-01-01T00:00:00Z' AND time < '2000-01-01:00:02:30Z' GROUP BY time(1m) you will get the 2:00 - 3:00 interval at the end, but it will only contain data between 2:00 and 2:29.999999999.

So I think this is expected behavior. Does that make sense?

@jomo
Copy link
Author

jomo commented Apr 4, 2017

In my opinion including a partial interval feels wrong. It becomes quite troublesome when combining this with fill(0), which is what I'm doing in Grafana (I want the absence of data for longer than the specified interval to be treated as 0).
The result is that the end of the graph drops to 0 and then suddenly jumps to its correct value when data is available.

I would only expect a null value when a complete interval passes with no data included.

Edit: made a gif to show what I mean
influxdb interval grafana

@jomo jomo changed the title GROUP BY time(x) extends queried time GROUP BY time(x) includes partial intervals Apr 4, 2017
@onlynone
Copy link

onlynone commented Nov 1, 2017

The big problem for this is when you're doing a count or sum without an end time constraint (or just time <= now()). The last period will always be too low and will slowly rise to the expected value over the time interval, and then drop back to 0 right after the next round time period has passed. Is there an easy way to simply exclude the last group if it's not complete (maybe only when using an open ended query)?

@dgnorton dgnorton added the 1.x label Jan 7, 2019
@gelinger777
Copy link

So how to cut off the last incomplete interval? How you solved this @jomo ?

@stale
Copy link

stale bot commented Oct 8, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Oct 8, 2019
@jomo
Copy link
Author

jomo commented Oct 8, 2019

I guess this is still an issue?

@stale stale bot removed the wontfix label Oct 8, 2019
@crab86
Copy link

crab86 commented Oct 8, 2019

Yes, it is...

@craigyk
Copy link

craigyk commented Nov 11, 2019

oh please +1, just started using influx this week, and this has been driving me crazy...

@mmihalev
Copy link

Having exactly the same behaviour... any update on this?

@azidyn
Copy link

azidyn commented Mar 1, 2020

I had this problem, and was about to report when I found this gh issue. I'm not sure if it's a feature or a bug?
My workaround was to use calculated absolute timestamps in the SELECT query and 'snap' the absolute timestamp to the interval specified in GROUP BY.

Was a two week long headache to track down exactly why my data was being corrupted.

@MikeKulls2
Copy link

This is an issue for pretty much every aggregated graph I create in Grafana. If you do sum or mean or some other aggregator and group by a time range then the last time range will always show an oddly low, high or incorrect value. I imagine every single user of influx/grafana encounters this exact issue. It would be great to have an "exclude incomplete group" or "exclude most recent group" option. The other possibility would be to have an option of group by NOT aligned to, say, hour. So if I group by hour and the current time is 3:15, then the group will be 2:15-3:15 instead of 2:00-3:00 and 3:00-4:00

@DaisyCutter
Copy link

+1 If anyone is still paying attention. I'm monitoring a lot of volumes using COUNT() and the latest value is always too low which is an issue for alerting.

@jsgygujun
Copy link

+1

@spmvoss
Copy link

spmvoss commented Feb 4, 2022

+1, driving me absolutely nuts. I need to count over 10s and select the last value to display in a card to use it as a warning indicator if things are wrong. I am now consistently getting the wrong value.

DifferentialOrange added a commit to tarantool/grafana-dashboard that referenced this issue May 17, 2022
Statistics for CRUD module was introduced in CRUD 0.11.0 [1].
To enable statistics integrated with metrics with quantiles, call

  crud.cfg{
      stats = true,
      stats_driver = 'metrics',
      stats_quantiles = true
  }

This patch adds panels for
- tnt_crud_stats summary (+ tnt_crud_stats_count, tnt_crud_stats_sum)
- tnt_crud_tuples_fetched
- tnt_crud_tuples_lookup
- tnt_crud_map_reduces

There is a group of panels for RPS load and a group of panels with
latency, both consists of separate panels for each operation
and ok/error status. Tuples panels are displayed as average per request.
InfluxDB queries for tuple panels have minor issue: they can show
0 current value from time to time due to `fill(0)` and `GROUP BY`
including partial intervals [2]. Map reduces panel show average RPS.

1. https://github.com/tarantool/crud/releases/tag/0.11.0
2. influxdata/influxdb#8244

Closes #143
DifferentialOrange added a commit to tarantool/grafana-dashboard that referenced this issue May 17, 2022
Statistics for CRUD module was introduced in CRUD 0.11.0 [1].
To enable statistics integrated with metrics with quantiles, call

  crud.cfg{
      stats = true,
      stats_driver = 'metrics',
      stats_quantiles = true
  }

This patch adds panels for
- tnt_crud_stats summary (+ tnt_crud_stats_count, tnt_crud_stats_sum)
- tnt_crud_tuples_fetched
- tnt_crud_tuples_lookup
- tnt_crud_map_reduces

CRUD panels are stored in "CRUD module statistics" section.
There is a group of panels for RPS load and a group of panels with
latency, both consists of separate panels for each operation
and ok/error status. Tuples panels are displayed as average per request.
InfluxDB queries for tuple panels have minor issue: they can show
0 current value from time to time due to `fill(0)` and `GROUP BY`
including partial intervals [2]. Map reduces panel show average RPS.

1. https://github.com/tarantool/crud/releases/tag/0.11.0
2. influxdata/influxdb#8244

Closes #143
DifferentialOrange added a commit to tarantool/grafana-dashboard that referenced this issue May 17, 2022
Statistics for CRUD module was introduced in CRUD 0.11.0 [1].
To enable statistics integrated with metrics with quantiles, call

  crud.cfg{
      stats = true,
      stats_driver = 'metrics',
      stats_quantiles = true
  }

This patch adds panels for
- tnt_crud_stats summary (+ tnt_crud_stats_count, tnt_crud_stats_sum)
- tnt_crud_tuples_fetched
- tnt_crud_tuples_lookup
- tnt_crud_map_reduces

CRUD panels are stored in "CRUD module statistics" section.
There is a group of panels for RPS load and a group of panels with
latency, both consists of separate panels for each operation
and ok/error status. Tuples panels are displayed as average per request.
InfluxDB queries for tuple panels have minor issue: they can show
0 current value from time to time due to `fill(0)` and `GROUP BY`
including partial intervals [2]. Map reduces panel show average RPS.

1. https://github.com/tarantool/crud/releases/tag/0.11.0
2. influxdata/influxdb#8244

Closes #143
DifferentialOrange added a commit to tarantool/grafana-dashboard that referenced this issue May 17, 2022
Statistics for CRUD module was introduced in CRUD 0.11.0 [1].
To enable statistics integrated with metrics with quantiles, call

  crud.cfg{
      stats = true,
      stats_driver = 'metrics',
      stats_quantiles = true
  }

This patch adds panels for
- tnt_crud_stats summary (+ tnt_crud_stats_count, tnt_crud_stats_sum)
- tnt_crud_tuples_fetched
- tnt_crud_tuples_lookup
- tnt_crud_map_reduces

CRUD panels are stored in "CRUD module statistics" section.
There is a group of panels for RPS load and a group of panels with
latency, both consists of separate panels for each operation
and ok/error status. Tuples panels are displayed as average per request.
InfluxDB queries for tuple panels have minor issue: they can show
0 current value from time to time due to `fill(0)` and `GROUP BY`
including partial intervals [2]. Map reduces panel show average RPS.

1. https://github.com/tarantool/crud/releases/tag/0.11.0
2. influxdata/influxdb#8244

Closes #143
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests