DISTINCT does not operate on string fields #6495

beckettsean · 2016-04-28T15:16:21Z

Bug report

System info: [Include InfluxDB version, operating system name, and other relevant details]

InfluxDB 0.12.2 on Ubuntu 14.04, fresh install

Steps to reproduce:

insert string fields
query string field with DISTINCT

Expected behavior: [What you expected to happen]

Return from DISTINCT(<string>) would be the number of distinct strings.

Actual behavior: [What actually happened]

Return was null.

Additional info: [Include gist of relevant config, logs, etc.]

> show field keys from a
name: a
-------
fieldKey
string
value

> select * from a
name: a
-------
time            string  value
1461695896546713026 12  12
1461856121749763139 11  11
1461856127357529714 10  10
1461856128790092843 10  10
1461856129686020379 10  10
1461856131452838575 11  11
1461856136245585327 13  13

> select distinct(value) from a
name: a
-------
time    distinct
0   12
0   11
0   10
0   13

> select distinct(string) from a
> select count(string) from a
name: a
-------
time    count
0   7

> format json
> select distinct(string) from a
{"results":[{}]}

The text was updated successfully, but these errors were encountered:

beckettsean · 2016-04-28T15:17:26Z

@jsternberg ^^

beckettsean · 2016-04-28T15:29:36Z

@jsternberg I created the data with the following:

> insert a value=12,string="12"
> insert a value=11,string="11"
> insert a value=10,string="10"
> insert a value=10,string="10"
> insert a value=10,string="10"
> insert a value=11,string="11"
> insert a value=13,string="13"

jsternberg · 2016-04-28T15:37:05Z

@beckettsean I'm unable to reproduce this with 0.12.2. I'm using one of the pending Docker images to ensure I have a clean slate.

> create database mydb
> use mydb
Using database mydb
> insert a value=12,string="12"
> insert a value=11,string="11"
> insert a value=10,string="10"
> insert a value=10,string="10"
> insert a value=10,string="10"
> insert a value=11,string="11"
> insert a value=13,string="13"
> select * from a
name: a
-------
time                    string  value
1461857260244815974     12      12
1461857267066315206     11      11
1461857271181150496     10      10
1461857275369303495     10      10
1461857279493054542     10      10
1461857287331499229     11      11
1461857292976276325     13      13

> select distinct(value) from a
name: a
-------
time    distinct
0       12
0       11
0       10
0       13

> select distinct(string) from a
name: a
-------
time    distinct
0       12
0       11
0       10
0       13

> select count(string) from a
name: a
-------
time    count
0       7

> format json
> select distinct(string) from a
{"results":[{"series":[{"name":"a","columns":["time","distinct"],"values":[[0,"12"],[0,"11"],[0,"10"],[0,"13"]]}]}]}

Can you confirm you're using the correct version?

beckettsean · 2016-04-28T16:09:10Z

On checking, my system is actually 0.12.1. Is there reason to believe this is fixed on 0.12.2?

# influx
Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.
Connected to http://localhost:8086 version 0.12.1
InfluxDB shell 0.12.1
>

beckettsean · 2016-04-28T16:12:06Z

Still happens for me on 0.12.2:

# influx
Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.
Connected to http://localhost:8086 version 0.12.2
InfluxDB shell 0.12.2
> use mydb
Using database mydb
> select distinct(value) from a
name: a
-------
time    distinct
0   12
0   11
0   10
0   13

> select distinct(string) from a
>

beckettsean · 2016-04-28T16:13:23Z

Replicated on new data:

> insert b value=1,string="a"
> insert b value=2,string="b"
> insert b value=2,string="b"
> insert b value=1,string="a"
> insert b value=3,string="c"
> select * from b
name: b
-------
time            string  value
1461859961560359056 a   1
1461859965408355832 b   2
1461859966311601860 b   2
1461859967615202434 a   1
1461859972269952068 c   3

> select distinct(value) from b
name: b
-------
time    distinct
0   1
0   2
0   3

> select distinct(string) from b
>

beckettsean · 2016-04-28T16:14:17Z

Not a syntax issue as far as I can tell:

> select distinct("string") from b
> select distinct('string') from b
ERR: error parsing query: expected field argument in distinct()
>

beckettsean · 2016-04-28T16:29:40Z

I spun up a brand new instance, installed 0.12.2, and DISTINCT operates as expected:

# influx
Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.
Connected to http://localhost:8086 version 0.12.2
InfluxDB shell 0.12.2
> create database mydb
> user mydb
ERR: error parsing query: found USER, expected SELECT, DELETE, SHOW, CREATE, DROP, GRANT, REVOKE, ALTER, SET, KILL at line 1, char 1
Warning: It is possible this error is due to not setting a database.
Please set a database with the command "use <database>".
> use mydb
Using database mydb
> insert a number=1,string="a"
> insert a number=1,string="a"
> insert a number=2,string="b"
> insert a number=2,string="b"
> insert a number=3,string="c"
> select * from a
name: a
-------
time            number  string
1461860834329907105 1   a
1461860836969349116 1   a
1461860841620218718 2   b
1461860842520594123 2   b
1461860848010369577 3   c

> select distinct(number) from a
name: a
-------
time    distinct
0   1
0   2
0   3

> select distinct(string) from a
name: a
-------
time    distinct
0   a
0   b
0   c

I'm not sure what it is about my other installations that prevents this, but it appears that it cannot be reproduced on a brand new box.

jsternberg · 2016-05-02T21:12:49Z

I looked at the underlying data set and I think it's caused by a side-effect of a bad decision in the underlying query engine. If a shard returns no iterators for a query, it returns a fake float iterator. When casting is done, floats are given priority and all non-float iterators are discarded, making it appear like there's no data.

When selecting as a raw field, this doesn't happen since the work is done through auxiliary iterators which don't have this problem. I'll start working on a fix.

jsternberg · 2016-05-02T21:13:56Z

To be clear, this should happen with any aggregation function for anything lower than a float that has differing output types (it should not affect count()).

If a shard is empty for a specific field and the field type is something other than a float, a nil iterator would get returned from one of the empty shards and cause the combined iterators to be cast to the float type and all other iterator types to be discarded (or for integers, to be cast). This is rare since most aggregates don't accept strings or booleans, but for queries like: SELECT distinct(string) FROM mydata It would result in nothing getting returned if one of the shards didn't have a value for `string`. This change modifies the query engine to return nil for the shards instead of a fake iterator and then to only use the fake iterator if the final aggregate iterator is nil (meaning that no iterators could be constructed for the field from any shard). Fixes #6495.

beckettsean added the area/functions label Apr 28, 2016

jsternberg self-assigned this Apr 28, 2016

jsternberg mentioned this issue May 3, 2016

Fix aggregate returns when data is missing from some shards #6539

Merged

jsternberg closed this as completed in #6539 May 3, 2016

jsternberg mentioned this issue May 12, 2016

DISTINCT does not return correct values #6615

Closed

jsternberg mentioned this issue May 21, 2016

FIRST & LAST do not work for string values #6438

Closed

timhallinflux added this to the 0.13.0 milestone Dec 20, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DISTINCT does not operate on string fields #6495

DISTINCT does not operate on string fields #6495

beckettsean commented Apr 28, 2016

beckettsean commented Apr 28, 2016

beckettsean commented Apr 28, 2016

jsternberg commented Apr 28, 2016

beckettsean commented Apr 28, 2016

beckettsean commented Apr 28, 2016

beckettsean commented Apr 28, 2016

beckettsean commented Apr 28, 2016

beckettsean commented Apr 28, 2016

jsternberg commented May 2, 2016

jsternberg commented May 2, 2016

DISTINCT does not operate on string fields #6495

DISTINCT does not operate on string fields #6495

Comments

beckettsean commented Apr 28, 2016

Bug report

beckettsean commented Apr 28, 2016

beckettsean commented Apr 28, 2016

jsternberg commented Apr 28, 2016

beckettsean commented Apr 28, 2016

beckettsean commented Apr 28, 2016

beckettsean commented Apr 28, 2016

beckettsean commented Apr 28, 2016

beckettsean commented Apr 28, 2016

jsternberg commented May 2, 2016

jsternberg commented May 2, 2016