Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wire up DISTINCT aggregate #1815

Closed
pauldix opened this issue Mar 2, 2015 · 2 comments
Closed

Wire up DISTINCT aggregate #1815

pauldix opened this issue Mar 2, 2015 · 2 comments
Assignees
Milestone

Comments

@pauldix
Copy link
Member

pauldix commented Mar 2, 2015

Query looks like this:

select distinct id from user_events
where time > now() - 24h
group by time(1h)

You'd get back the distinct user ids for each hour bucket of time for the last 24 hours.

With distinct you can either pass in a field or a tag. The query engine will have to do wildly different things to get the result depending on what they do.

If it's a tag and no time limitation was specified, then the metastore can answer the query directly. If a time limit was specified, you'll have to run a query.

If it's a field, then you'll have to run a query and run through the whole engine.

@pauldix pauldix added this to the Next Point Release milestone Mar 20, 2015
@pauldix pauldix modified the milestones: 0.9.0, Next Point Release May 12, 2015
@corylanou
Copy link
Contributor

For this current iteration (and based on internal discussions), I'm proposing the following:

Supported

#SELECT DISTINCT <field key> FROM <measurement>
SELECT DISTINCT value FROM cpu

# SELECT DISTINCT <tag key> FROM <measurement>
SELECT DISTINCT host FROM cpu

#SELECT DISTINCT <tag key> FROM <measurement> WHERE time > now() - 24h
SELECT DISTINCT host FROM cpu WHERE time > now() - 24h

# SELECT DISTINCT <field key> FROM <measurement> GROUP BY time(1h) where time > now() - 1d
SELECT DISTINCT value FROM cpu GROUP BY time(1h) where time > now() - 1d

Not currently implemented (might in the future, but no promises)

#SELECT DISTINCT <field key>, <field key> FROM <measurement>
SELECT DISTINCT id, value FROM cpu

#SELECT DISTINCT <tag key>, <tag key> FROM <measurement>
SELECT DISTINCT host, region FROM cpu

# No aggregate functions with a select distinct (sum, count, mean, max, etc.)

#SELECT DISTINCT <field key>, count(<field key>) FROM <measurement>
SELECT DISTINCT id, sum(bytes) FROM network

#SELECT DISTINCT <tag key>, count(<field key>) FROM <measurement>
SELECT DISTINCT host, sum(bytes) FROM network

#SELECT DISTINCT <tag key>, <tag key> FROM <measurement>
SELECT DISTINCT id, value FROM cpu

SELECT DISTINCT <tag values> FROM series WHERE KEY=<tag key>

@andyxning
Copy link

andyxning commented Jul 26, 2016

@pauldix any plan to support distinct on tag key, see #3880 for more info. We do really need that functionality.

# SELECT DISTINCT <tag key> FROM <measurement> GROUP BY time(1h) where time > now() - 1d
SELECT DISTINCT tag_key FROM cpu GROUP BY time(1h) where time > now() - 1d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants