Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add approx_quantile support #1538

Closed
domodwyer opened this issue Jan 10, 2022 · 0 comments · Fixed by #1539
Closed

Add approx_quantile support #1538

domodwyer opened this issue Jan 10, 2022 · 0 comments · Fixed by #1539
Labels
datafusion Changes in the datafusion crate enhancement New feature or request

Comments

@domodwyer
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I would like to efficiently aggregate (approximate) quantile values from a column of data - "show me the 99th percentile of the latency column in the requests table"

Describe the solution you'd like
Implement TDigest (or similar algorithm) to provide relatively cheap quantile values/estimations.

Describe alternatives you've considered
I've had a look at some other DBs:

  • duckdb - tdigest & reservoir sampling
  • timescaledb - tdigest & uddsketch
  • snowflake - several options, including tdigest for cheap approximations
  • presto - qdigest
  • influxdb - tdigest

For approximate results, tdigest seems popular, though the uddsketch paper is relatively new and also interesting.

Additional context
Tdigest provides quantile estimatations, I imagine it would expose an approx_quantile(column, quantile) aggregation keeping with the naming of the approx_distinct() aggregation.

Example:

SELECT approx_quantile(latency, 0.99) AS p99 FROM requests;
@domodwyer domodwyer added the enhancement New feature or request label Jan 10, 2022
@alamb alamb added the datafusion Changes in the datafusion crate label Feb 10, 2022
@alamb alamb changed the title Quantile support Add approx_quantile support Feb 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants