Add `approx_quantile` support #1538

domodwyer · 2022-01-10T14:39:38Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I would like to efficiently aggregate (approximate) quantile values from a column of data - "show me the 99th percentile of the latency column in the requests table"

Describe the solution you'd like
Implement TDigest (or similar algorithm) to provide relatively cheap quantile values/estimations.

Describe alternatives you've considered
I've had a look at some other DBs:

duckdb - tdigest & reservoir sampling
timescaledb - tdigest & uddsketch
snowflake - several options, including tdigest for cheap approximations
presto - qdigest
influxdb - tdigest

For approximate results, tdigest seems popular, though the uddsketch paper is relatively new and also interesting.

Additional context
Tdigest provides quantile estimatations, I imagine it would expose an approx_quantile(column, quantile) aggregation keeping with the naming of the approx_distinct() aggregation.

Example:

SELECT approx_quantile(latency, 0.99) AS p99 FROM requests;

The text was updated successfully, but these errors were encountered:

domodwyer added the enhancement New feature or request label Jan 10, 2022

domodwyer mentioned this issue Jan 10, 2022

Add approx_percentile_cont() aggregation function #1539

Merged

alamb closed this as completed in #1539 Jan 31, 2022

alamb added the datafusion Changes in the datafusion crate label Feb 10, 2022

alamb changed the title ~~Quantile support~~ Add approx_quantile support Feb 10, 2022

jychen7 mentioned this issue Mar 13, 2022

feat: ApproxPercentileCont supports sketches from data source #2004

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `approx_quantile` support #1538

Add `approx_quantile` support #1538

domodwyer commented Jan 10, 2022

Add approx_quantile support #1538

Add approx_quantile support #1538

Comments

domodwyer commented Jan 10, 2022

Add `approx_quantile` support #1538

Add `approx_quantile` support #1538