Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First value of query is empty. #15324

Closed
iridos opened this issue Oct 7, 2019 · 4 comments
Closed

First value of query is empty. #15324

iridos opened this issue Oct 7, 2019 · 4 comments

Comments

@iridos
Copy link

iridos commented Oct 7, 2019

Steps to reproduce:

Run queries with a GROUP BY time($interval) in which $interval equals the collection time of a metric.

Expected behavior:

  • All time slices are filled if there are no gaps in data collection.
  • timestamps returned on a limit of time >= $sometime returns only times with a value larger than $sometime

Actual behavior:

  • First value is 0 for count() and null for mean()
  • a timestamp before the requested time is returned

Environment info:

  • System info: Linux 3.10.0-862.14.4.el7.x86_64 x86_64
  • InfluxDB version: InfluxDB v1.7.8 (git: 1.7 ff383cd)

Analysis of problem and reproduction:
Consider the following output at [1] below.
My query starts at 1570391480000. The cpu metric is collected every 15 seconds.
(From now on, I will truncate the timestamps down to the last 4 significant digits for readability, so that sentence above would become "My query starts at 1480000.")
Hence the values from 1480000 to 1489000 are empty, the first value was collected at 1490000. This behavior is ok and empty output could be suppressed by fill(none).

The query with the 15s GROUP starts at 1475000 - that is 15 seconds before the first value and 5 seconds before the begin of the requested query. The first value is empty and has a count of 0.

What seems to happen: influxdb extends the time-frame backwards by 15 seconds before the first value to cover the full requested query inverval by a time slice AND return time stamps of recorded data (not of some type of query bucket). But it does not extend the time-frame of the actual query in the same way. And so the data from timestamp 1475000 is lost and not included in the result, although an answer is produced for that 15s time period.

Solution(s):
I see two… or three possible ways to solve this and have no preference for either.
a) skip the first data point --> The first data point returned by the query is 1490000 from the example. There is no earlier timestamp returned by the query.
b) also extend the query time backwards. This would keep starting results from 1475000 , but include the data from 1475000
c) start returning groups from exactly the time requested and go in $interval steps from there. This would start at 1480000 and return the value recorded at 1490000 for 1480000

[1]

====================================================
> SELECT first("usage_idle"),last("usage_idle"),count("usage_idle")  FROM "cpu" WHERE ("cpu" = 'cpu-total' AND "host" = 'n0101') AND time >= 1570391480000*1000000  GROUP BY time(1s)  LIMIT 30
name: cpu
time          first             last              count
----          -----             ----              -----
1570391480000                                     0
1570391481000                                     0
1570391482000                                     0
1570391483000                                     0
1570391484000                                     0
1570391485000                                     0
1570391486000                                     0
1570391487000                                     0
1570391488000                                     0
1570391489000                                     0
1570391490000 49.5352617545902  49.5352617545902  1
1570391491000                                     0
1570391492000                                     0
1570391493000                                     0
1570391494000                                     0
1570391495000                                     0
1570391496000                                     0
1570391497000                                     0
1570391498000                                     0
1570391499000                                     0
1570391500000                                     0
1570391501000                                     0
1570391502000                                     0
1570391503000                                     0
1570391504000                                     0
1570391505000 49.91765515315524 49.91765515315524 1
1570391506000                                     0
1570391507000                                     0
1570391508000                                     0
1570391509000                                     0
> SELECT first("usage_idle"),last("usage_idle"),count("usage_idle")  FROM "cpu" WHERE ("cpu" = 'cpu-total' AND "host" = 'n0101') AND time >= 1570391480000*1000000  GROUP BY time(15s)  LIMIT 3
name: cpu
time          first             last              count
----          -----             ----              -----
1570391475000                                     0
1570391490000 49.5352617545902  49.5352617545902  1
1570391505000 49.91765515315524 49.91765515315524 1
====================================================
@iridos
Copy link
Author

iridos commented Oct 7, 2019

This seems to be pretty much #8010.
jsternberg commented, that this may not be possible in the current query engine.
Can you please reconsider for the other options that I provided?
It seems the last bucket is also affected.
Please consider just dropping the first and the last bucket (with a possible limit of more than 10 or so buckets - but I don't see that incomplete buckets are useful even for a small number of buckets)
You can even make dropping those buckets optional. There are many possible solutions, any of them would be preferable to returning a completely false result on some timestamps. I cannot imagine that all of those are "impossible" with the current engine.

@russorat russorat added the 1.x label Oct 7, 2019
@stale
Copy link

stale bot commented Jan 5, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jan 5, 2020
@stale
Copy link

stale bot commented Jan 12, 2020

This issue has been automatically closed because it has not had recent activity. Please reopen if this issue is still important to you. Thank you for your contributions.

@stale stale bot closed this as completed Jan 12, 2020
@iridos
Copy link
Author

iridos commented Jan 30, 2020

As nobody seems to have seen this… probably I whould write an unstale-bot that comments into this bug every week or so?

@iridos iridos changed the title First value of query is empty First value of query is empty. Jan 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants