Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.9.5] desc modifier in order clause causes records to be omitted #4944

Closed
CVTJNII opened this issue Dec 1, 2015 · 9 comments
Closed

[0.9.5] desc modifier in order clause causes records to be omitted #4944

CVTJNII opened this issue Dec 1, 2015 · 9 comments

Comments

@CVTJNII
Copy link

CVTJNII commented Dec 1, 2015

When sorting a query by time descending records are not returned. Using a simple test script against a InfluxDB 0.9.4.2 instance which is being constantly updated:

Current time: 2015-12-01T16:48:01.586443Z
'select time,value from "riemann streams latency 0.95" order by time desc' latest time: 2015-12-01T15:22:59.663Z
'select time,value from "riemann streams latency 0.95" order by time' latest time: 2015-12-01T16:48:00.011Z
'select time,value from "riemann streams latency 0.95"' latest time: 2015-12-01T16:48:00.011Z

Note that records for 16:48 were returned for the queries without the desc modifier, whereas with the desc modifier the latest time is 15:22: over an hour's worth of records are missing. As the sort order is descending the latest records will be first, so this is not a pagination issue.

The above was generated with the following script, which does it's own time sort to ensure the latest row is printed on unordered queries:

#!/usr/bin/python -tt

import argparse
import urllib
import urllib2
import json
import datetime
import sys

def parse_args():
    """Parse command line arguments"""
    parser = argparse.ArgumentParser(description='Check age of InfluxDB points for a given measurement match', formatter_class=argparse.ArgumentDefaultsHelpFormatter)

    parser.add_argument('-H', '--host', action='store', default='172.17.42.1', help='InfluxDB Host', type=str)
    parser.add_argument('-P', '--port', action='store', default='8086', help='InfluxDB Port', type=int)
    parser.add_argument('-u', '--user', action='store', default=None, help='InfluxDB Username', type=str)
    parser.add_argument('-p', '--passwd', action='store', default=None, help='InfluxDB Password', type=str)
    parser.add_argument('-d', '--database', action='store', default='riemann', help='InfluxDB Database', type=str)

    parser.add_argument('-m', '--measurement', action='store', default='/.*/', help='InfluxDB Measurement', type=str)

    return parser.parse_args()

def run_query(args, query):
    """Run the influxDB query"""
    # This uses urllib(2) instead of something like requests to avoid needing to install packages

    # There are some injection vulnerabilities here... (TODO)
    url_args = {'db': args.database,
                'q': query
                }

    if args.user is not None or args.passwd is not None:
        url_args['u'] = args.user
        url_args['p'] = args.passwd

    url = "http://%s:%s/query?%s" % (args.host, args.port, urllib.urlencode(url_args))
    try:
        api_raw_data = urllib2.urlopen(url).read()
    except Exception as exc:
        raise Exception("Unable to query InfluxDB: %s" % str(exc))

    try:
        data = json.loads(api_raw_data)
    except Exception as exc:
        raise Exception("Unable to parse API response: %s" % str(exc))

    return data['results'][0]['series'][0]

def print_query(args, query):
    resp = sorted(run_query(args, query)['values'], key = lambda r: r[0])
    print "'%s' latest time: %s" % (query, resp[-1][0])

def main():
    args = parse_args()

    print "Current time: %s" % datetime.datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%S.%fZ')
#    print_query(args, ("select time,value from %s order by time desc limit 1" % args.measurement))
    print_query(args, ("select time,value from %s order by time desc" % args.measurement))
    print_query(args, ("select time,value from %s order by time" % args.measurement))
    print_query(args, ("select time,value from %s" % args.measurement))
#    print_query(args, ("select last(value) from %s" % args.measurement))

if __name__ == '__main__':
    sys.exit(main())

The above was run with measurement set to a single series, however this behavior is seen against /.*/ as well.

@CVTJNII
Copy link
Author

CVTJNII commented Dec 1, 2015

Unable to test against nightly as nightly (0.9.6-nightly-3de9b9b) fails to run with 'panic: close of nil channel'.

@beckettsean
Copy link
Contributor

This is a duplicate of #4235, which is fixed in 0.9.5. Can you upgrade to that release?

@CVTJNII
Copy link
Author

CVTJNII commented Dec 1, 2015

Still present in 0.9.5.1

Current time: 2015-12-01T21:20:46.596645Z
'select time,value from "riemann streams latency 0.95" order by time desc limit 1' latest time: 2015-12-01T21:05:51.457Z
'select time,value from "riemann streams latency 0.95" order by time desc' latest time: 2015-12-01T21:05:51.457Z
'select time,value from "riemann streams latency 0.95" order by time' latest time: 2015-12-01T21:21:31.633Z
'select time,value from "riemann streams latency 0.95"' latest time: 2015-12-01T21:22:11.64Z
'select last(value) from "riemann streams latency 0.95"' latest time: 2015-12-01T21:22:51.646Z

@CVTJNII
Copy link
Author

CVTJNII commented Dec 1, 2015

Does the fix for this not work with existing data? I don't see this on 0.9.5.1 in a sandbox with no existing data but I do see it after upgrading my prod hosts from 0.9.4.2.

@beckettsean beckettsean reopened this Dec 1, 2015
@beckettsean
Copy link
Contributor

@CVTJNII did you change the storage engine configuration for 0.9.5.1 or are you running the default bz1 engine? You can check the engine setting in the [data] section of the config file.

@beckettsean beckettsean changed the title desc modifier in order clause causes records to be omitted [0.9.5] desc modifier in order clause causes records to be omitted Dec 1, 2015
@CVTJNII
Copy link
Author

CVTJNII commented Dec 1, 2015

I haven't intentionally tuned the engine, it should be using the defaults. This is my [data] config block:

[data]
dir = "/var/opt/influxdb/data"
max-wal-size = 104857600
query-log-enabled = false
wal-dir = "/var/opt/influxdb/wal"
wal-enable-logging = true
wal-flush-interval = "10m"
wal-partition-flush-delay = "2s"

@beckettsean
Copy link
Contributor

Thanks @CVTJNII, that means you're using the default bz1 engine.

@corylanou it looks like #4235 might have come back. Any ideas what might be happening?

@theoweiss
Copy link

I've written an influxdb based persistence service for openHAB openhab/openhab1-addons#3610 .
I faced this "order by DESC" bug some time ago (mentioned here openhab/openhab1-addons#2748 (comment)). The bug changed, but still exists in 0.9.6.1:

> select value from ATestTime where  time < 1451517300s ORDER BY time ASC
name: ATestTime
---------------
time        value
1451517180  1.451517120221e+12
1451517240  1.451517180085e+12

> select value from ATestTime where  time < 1451517300s ORDER BY time DESC
> 

I've used influxdb 0.9.6.1 from homebrew and before testing I've deleted all influxdb data files and switched to tsm1:

$ grep tsm /usr/local/etc/influxdb.conf.0.9.theo3
  dir = "/Users/theo/.influxdb/datatsm1"
  engine = "tsm1"

Any hints how to solve this?

@jsternberg
Copy link
Contributor

I believe this has been fixed during the query engine refactor. If you still experience this issue, please create a new issue. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants