Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add approx_percentile_cont() aggregation function #1539

Merged
merged 10 commits into from
Jan 31, 2022

Commits on Jan 25, 2022

  1. feat: implement TDigest for approx quantile

    Adds a [TDigest] implementation providing approximate quantile
    estimations of large inputs using a small amount of (bounded) memory.
    
    A TDigest is most accurate near either "end" of the quantile range (that
    is, 0.1, 0.9, 0.95, etc) due to the use of a scalaing function that
    increases resolution at the tails. The paper claims single digit part
    per million errors for q ≤ 0.001 or q ≥ 0.999 using 100 centroids, and
    in practice I have found accuracy to be more than acceptable for an
    apprixmate function across the entire quantile range.
    
    The implementation is a modified copy of
    https://github.com/MnO2/t-digest, itself a Rust port of [Facebook's C++
    implementation]. Both Facebook's implementation, and Mn02's Rust port
    are Apache 2.0 licensed.
    
    [TDigest]: https://arxiv.org/abs/1902.04023
    [Facebook's C++ implementation]: https://github.com/facebook/folly/blob/main/folly/stats/TDigest.h
    domodwyer committed Jan 25, 2022
    Configuration menu
    Copy the full SHA
    69f498e View commit details
    Browse the repository at this point in the history
  2. feat: approx_quantile aggregation

    Adds the ApproxQuantile physical expression, plumbing & test cases.
    
    The function signature is:
    
    	approx_quantile(column, quantile)
    
    Where column can be any numeric type (that can be cast to a float64) and
    quantile is a float64 literal between 0 and 1.
    domodwyer committed Jan 25, 2022
    Configuration menu
    Copy the full SHA
    d9a7be2 View commit details
    Browse the repository at this point in the history
  3. feat: approx_quantile dataframe function

    Adds the approx_quantile() dataframe function, and exports it in the
    prelude.
    domodwyer committed Jan 25, 2022
    Configuration menu
    Copy the full SHA
    0cbacd1 View commit details
    Browse the repository at this point in the history
  4. refactor: bastilla approx_quantile support

    Adds bastilla wire encoding for approx_quantile.
    
    Adding support for this required modifying the AggregateExprNode proto
    message to support propigating multiple LogicalExprNode aggregate
    arguments - all the existing aggregations take a single argument, so
    this wasn't needed before.
    
    This commit adds "repeated" to the expr field, which I believe is
    backwards compatible as described here:
    
    	https://developers.google.com/protocol-buffers/docs/proto3#updating
    
    Specifically, adding "repeated" to an existing message field:
    
    	"For ... message fields, optional is compatible with repeated"
    
    No existing tests needed fixing, and a new roundtrip test is included
    that covers the change to allow multiple expr.
    domodwyer committed Jan 25, 2022
    Configuration menu
    Copy the full SHA
    c415178 View commit details
    Browse the repository at this point in the history
  5. refactor: use input type as return type

    Casts the calculated quantile value to the same type as the input data.
    domodwyer committed Jan 25, 2022
    Configuration menu
    Copy the full SHA
    85af343 View commit details
    Browse the repository at this point in the history

Commits on Jan 26, 2022

  1. Configuration menu
    Copy the full SHA
    e8f8e3f View commit details
    Browse the repository at this point in the history

Commits on Jan 27, 2022

  1. refactor: rebase onto main

    domodwyer committed Jan 27, 2022
    Configuration menu
    Copy the full SHA
    faa8094 View commit details
    Browse the repository at this point in the history

Commits on Jan 29, 2022

  1. refactor: validate quantile value

    Ensures the quantile values is between 0 and 1, emitting a plan error if
    not.
    domodwyer committed Jan 29, 2022
    Configuration menu
    Copy the full SHA
    03a5eff View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    c216f48 View commit details
    Browse the repository at this point in the history

Commits on Jan 31, 2022

  1. refactor: clippy lints

    domodwyer committed Jan 31, 2022
    Configuration menu
    Copy the full SHA
    3612493 View commit details
    Browse the repository at this point in the history