-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add approx_percentile_cont()
aggregation function
#1539
Commits on Jan 25, 2022
-
feat: implement TDigest for approx quantile
Adds a [TDigest] implementation providing approximate quantile estimations of large inputs using a small amount of (bounded) memory. A TDigest is most accurate near either "end" of the quantile range (that is, 0.1, 0.9, 0.95, etc) due to the use of a scalaing function that increases resolution at the tails. The paper claims single digit part per million errors for q ≤ 0.001 or q ≥ 0.999 using 100 centroids, and in practice I have found accuracy to be more than acceptable for an apprixmate function across the entire quantile range. The implementation is a modified copy of https://github.com/MnO2/t-digest, itself a Rust port of [Facebook's C++ implementation]. Both Facebook's implementation, and Mn02's Rust port are Apache 2.0 licensed. [TDigest]: https://arxiv.org/abs/1902.04023 [Facebook's C++ implementation]: https://github.com/facebook/folly/blob/main/folly/stats/TDigest.h
Configuration menu - View commit details
-
Copy full SHA for 69f498e - Browse repository at this point
Copy the full SHA 69f498eView commit details -
feat: approx_quantile aggregation
Adds the ApproxQuantile physical expression, plumbing & test cases. The function signature is: approx_quantile(column, quantile) Where column can be any numeric type (that can be cast to a float64) and quantile is a float64 literal between 0 and 1.
Configuration menu - View commit details
-
Copy full SHA for d9a7be2 - Browse repository at this point
Copy the full SHA d9a7be2View commit details -
feat: approx_quantile dataframe function
Adds the approx_quantile() dataframe function, and exports it in the prelude.
Configuration menu - View commit details
-
Copy full SHA for 0cbacd1 - Browse repository at this point
Copy the full SHA 0cbacd1View commit details -
refactor: bastilla approx_quantile support
Adds bastilla wire encoding for approx_quantile. Adding support for this required modifying the AggregateExprNode proto message to support propigating multiple LogicalExprNode aggregate arguments - all the existing aggregations take a single argument, so this wasn't needed before. This commit adds "repeated" to the expr field, which I believe is backwards compatible as described here: https://developers.google.com/protocol-buffers/docs/proto3#updating Specifically, adding "repeated" to an existing message field: "For ... message fields, optional is compatible with repeated" No existing tests needed fixing, and a new roundtrip test is included that covers the change to allow multiple expr.
Configuration menu - View commit details
-
Copy full SHA for c415178 - Browse repository at this point
Copy the full SHA c415178View commit details -
refactor: use input type as return type
Casts the calculated quantile value to the same type as the input data.
Configuration menu - View commit details
-
Copy full SHA for 85af343 - Browse repository at this point
Copy the full SHA 85af343View commit details
Commits on Jan 26, 2022
-
Configuration menu - View commit details
-
Copy full SHA for e8f8e3f - Browse repository at this point
Copy the full SHA e8f8e3fView commit details
Commits on Jan 27, 2022
-
Configuration menu - View commit details
-
Copy full SHA for faa8094 - Browse repository at this point
Copy the full SHA faa8094View commit details
Commits on Jan 29, 2022
-
refactor: validate quantile value
Ensures the quantile values is between 0 and 1, emitting a plan error if not.
Configuration menu - View commit details
-
Copy full SHA for 03a5eff - Browse repository at this point
Copy the full SHA 03a5effView commit details -
Configuration menu - View commit details
-
Copy full SHA for c216f48 - Browse repository at this point
Copy the full SHA c216f48View commit details
Commits on Jan 31, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 3612493 - Browse repository at this point
Copy the full SHA 3612493View commit details