Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking: Approx Percentile #17531

Open
14 of 26 tasks
kwannoel opened this issue Jul 2, 2024 · 1 comment
Open
14 of 26 tasks

Tracking: Approx Percentile #17531

kwannoel opened this issue Jul 2, 2024 · 1 comment
Assignees

Comments

@kwannoel
Copy link
Contributor

kwannoel commented Jul 2, 2024

Stream:

  • Approx Percentile Frontend: Two Phase Stateless Simple Agg
  • Approx Percentile Frontend: Two Phase Vnode Based Agg
  • Approx Percentile Frontend: Shuffle Simple Agg
  • Approx Percentile Frontend: Shuffle Hash Agg
  • Approx Percentile Streaming: RowMerge. This will buffer all input from lhs and rhs, until barrier comes. Then flush the data.
  • Approx Percentile Streaming: Streaming Approx Percentile Aggregation Operator. This is for shuffle aggs.
  • Approx Percentile Streaming: Streaming Approx Percentile Aggregation Executors. This includes partial agg and global agg executors. It's for two phase agg and will use keyedmerge to merge its results with other aggregators in the same select clause.
  • Add Approx Percentile Cache.
  • Handle empty state.
  • Force output every epoch for simple agg, if there's any input at all in an epoch. This is so we can construct a full record in keyed merge for updates.
  • Force output every epoch if any input, Vnode based two phase agg.
  • Ban approx percentile in force two phase group agg.
  • Ban distinct approx percentile.
  • Approx Percentile Streaming: Support multiple percentiles.

Batch:

  • Batch simple approx percentile.
  • Batch two-phase approx percentile.

Test / Benchmarks:

  • Bench two phase simple agg approx percentile vs shuffle simple agg approx percentile.
  • Fuzz test against percentile_cont / percentile_disc.
  • Test errors for all unsupported versions of approx percentile
  • Test shuffle simple agg.
  • Test deletes for two phase stateless approx percentile.

Optimizations:

UX Improvement:

  • Make the approx_percentile relative_error field optional. It can be 1% by default.

Docs:

  • Mention bucket sizes
  • Number of buckets
  • Accuracy guarantees.
@kwannoel
Copy link
Contributor Author

Important parts are finished.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants