Wire up DISTINCT aggregate #1815

pauldix · 2015-03-02T23:31:28Z

Query looks like this:

select distinct id from user_events
where time > now() - 24h
group by time(1h)

You'd get back the distinct user ids for each hour bucket of time for the last 24 hours.

With distinct you can either pass in a field or a tag. The query engine will have to do wildly different things to get the result depending on what they do.

If it's a tag and no time limitation was specified, then the metastore can answer the query directly. If a time limit was specified, you'll have to run a query.

If it's a field, then you'll have to run a query and run through the whole engine.

The text was updated successfully, but these errors were encountered:

corylanou · 2015-05-12T19:29:59Z

For this current iteration (and based on internal discussions), I'm proposing the following:

Supported

#SELECT DISTINCT <field key> FROM <measurement>
SELECT DISTINCT value FROM cpu

# SELECT DISTINCT <tag key> FROM <measurement>
SELECT DISTINCT host FROM cpu

#SELECT DISTINCT <tag key> FROM <measurement> WHERE time > now() - 24h
SELECT DISTINCT host FROM cpu WHERE time > now() - 24h

# SELECT DISTINCT <field key> FROM <measurement> GROUP BY time(1h) where time > now() - 1d
SELECT DISTINCT value FROM cpu GROUP BY time(1h) where time > now() - 1d

Not currently implemented (might in the future, but no promises)

#SELECT DISTINCT <field key>, <field key> FROM <measurement>
SELECT DISTINCT id, value FROM cpu

#SELECT DISTINCT <tag key>, <tag key> FROM <measurement>
SELECT DISTINCT host, region FROM cpu

# No aggregate functions with a select distinct (sum, count, mean, max, etc.)

#SELECT DISTINCT <field key>, count(<field key>) FROM <measurement>
SELECT DISTINCT id, sum(bytes) FROM network

#SELECT DISTINCT <tag key>, count(<field key>) FROM <measurement>
SELECT DISTINCT host, sum(bytes) FROM network

#SELECT DISTINCT <tag key>, <tag key> FROM <measurement>
SELECT DISTINCT id, value FROM cpu

SELECT DISTINCT <tag values> FROM series WHERE KEY=<tag key>

andyxning · 2016-07-26T08:14:25Z

@pauldix any plan to support distinct on tag key, see #3880 for more info. We do really need that functionality.

# SELECT DISTINCT <tag key> FROM <measurement> GROUP BY time(1h) where time > now() - 1d
SELECT DISTINCT tag_key FROM cpu GROUP BY time(1h) where time > now() - 1d

pauldix added this to the Next Point Release milestone Mar 20, 2015

pauldix modified the milestones: 0.9.0, Next Point Release May 12, 2015

pauldix assigned corylanou May 12, 2015

corylanou added the 2 - Working label May 12, 2015

corylanou mentioned this issue May 12, 2015

Wire up COUNT DISTINCT aggregate #1891

Closed

corylanou mentioned this issue May 14, 2015

Wire up SELECT DISTINCT #2568

Merged

corylanou added review and removed 2 - Working labels May 18, 2015

corylanou closed this as completed May 20, 2015

corylanou removed the review label May 20, 2015

TechniclabErdmann mentioned this issue Aug 28, 2015

Allow DISTINCT function to operate on tags #3880

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wire up DISTINCT aggregate #1815

Wire up DISTINCT aggregate #1815

pauldix commented Mar 2, 2015

corylanou commented May 12, 2015

andyxning commented Jul 26, 2016 •

edited

Loading

Wire up DISTINCT aggregate #1815

Wire up DISTINCT aggregate #1815

Comments

pauldix commented Mar 2, 2015

corylanou commented May 12, 2015

Supported

Not currently implemented (might in the future, but no promises)

andyxning commented Jul 26, 2016 • edited Loading

andyxning commented Jul 26, 2016 •

edited

Loading