Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add quantileInterpolatedWeighted function #38252

Merged
merged 20 commits into from
Jan 16, 2023

Conversation

bharatnc
Copy link
Contributor

@bharatnc bharatnc commented Jun 21, 2022

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add quantileInterpolatedWeighted/quantilesInterpolatedWeighted functions.

The current quantileExactWeighted function returns the exact quantile. Add another weighted quantile function that instead approximates the quantiles:

  • sorts the input values and corresponding weights.
  • builds a cumulative distribution based on weights.
  • performs linear interpolation b/w the weights and the values to compute the quantiles.

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

@robot-clickhouse robot-clickhouse added the pr-feature Pull request with new product feature label Jun 21, 2022
@evillique evillique self-assigned this Jun 21, 2022
@bharatnc bharatnc force-pushed the ncb/weighted-quantile-approx branch from 66e8aaf to 4b727e8 Compare June 22, 2022 02:53
@alexey-milovidov
Copy link
Member

The name can be confusing, as it can be confused with approximate quantile sketches.
Better to name it quantileExactInterpolated and quantileExactInterpolatedWeighted.
Also the functions quantileExactInclusiveWeighted, quantileExactExclusiveWeighted, quantileExactLowWeighted, quantileExactHighWeighted should be added for consistency.

@bharatnc
Copy link
Contributor Author

The name can be confusing, as it can be confused with approximate quantile sketches. Better to name it quantileExactInterpolated and quantileExactInterpolatedWeighted. Also the functions quantileExactInclusiveWeighted, quantileExactExclusiveWeighted, quantileExactLowWeighted, quantileExactHighWeighted should be added for consistency.

Sure, thank you for the comment. I'm thinking of replacing quantile(s)ApproximateWeighted with quantile(s)InterpolatedWeighted in this PR. quantile(s)InterpolatedWeighted sounds a little more apt to me (without the word exact).

@bharatnc bharatnc force-pushed the ncb/weighted-quantile-approx branch 2 times, most recently from eb87eb9 to 4e7f8b0 Compare June 23, 2022 08:52
@alexey-milovidov
Copy link
Member

Ok.

@bharatnc bharatnc force-pushed the ncb/weighted-quantile-approx branch 2 times, most recently from 3a05ba7 to 95800b2 Compare June 24, 2022 05:02
@bharatnc
Copy link
Contributor Author

bharatnc commented Jun 24, 2022

  • rename quantileApproximateWeighted to quantileInterpolatedWeighted everywhere.

(I'll do it finally since it's only a rename)

@bharatnc bharatnc force-pushed the ncb/weighted-quantile-approx branch 3 times, most recently from dd4d1e8 to d0bfb4e Compare June 26, 2022 08:35
@bharatnc bharatnc changed the title add quantileApproximateWeighted function add quantileInterpolatedWeighted function Jul 9, 2022
@bharatnc
Copy link
Contributor Author

@evillique the PR is ready for your review, would appreciate your feedback.

Co-authored-by: Nikolay Degterinsky <43110995+evillique@users.noreply.github.com>
@evillique
Copy link
Member

evillique commented Dec 13, 2022

I'm sorry that it took so long to get back to this PR, could you please update this branch to the latest master and add documentation similar to this?

factory.registerFunction<FunctionConcatWithSeparator>({
R"(
Returns the concatenation strings separated by string separator. Syntax: concatWithSeparator(sep, expr1, expr2, expr3...)
)",
Documentation::Examples{{"concatWithSeparator", "SELECT concatWithSeparator('a', '1', '2', '3')"}},
Documentation::Categories{"String"}});

@bharatnc
Copy link
Contributor Author

@evillique thank you for taking a look again, sure I will rebase and update again today.

@bharatnc
Copy link
Contributor Author

@evillique thank you for taking a look again, sure I will rebase and update again today.

@evillique rebased and fixed the build failures mostly (failing ones seem unrelated but obscure to me).

For docs, I added a separate markdown for this function. (If you want me to add docs in the way you have recommended - can I do that in an another PR ? Looks like it needs to be done for other quantile and possibly for other aggregate functions as well.)

@evillique
Copy link
Member

evillique commented Dec 16, 2022

@evillique rebased and fixed the build failures mostly (failing ones seem unrelated but obscure to me).

Sometimes builders fail for unrelated reasons. When the status of the build is something similar to BuilderDebMsan failed in the report, it is unrelated and the build process should be restarted in the near future.

For docs, I added a separate markdown for this function. (If you want me to add docs in the way you have recommended - can I do that in an another PR ? Looks like it needs to be done for other quantile and possibly for other aggregate functions as well.)

For now, internal documentation for aggregate functions is not required but is recommended. If you can also do it for other aggregate functions it would be greatly appreciated.

Also, it looks like AST fuzzer (ubsan) test failure is related to the changes in this PR:

../src/AggregateFunctions/QuantileInterpolatedWeighted.h:298:56: runtime error: -1.29127e+19 is outside the range of representable values of type 'long' Received signal -3 Received signal sanitizer trap (-3)

@bharatnc
Copy link
Contributor Author

bharatnc commented Dec 16, 2022

Interesting, the AST fuzzer for ubsan passed for the previous commit - no code changes. I'll take a look.

@bharatnc
Copy link
Contributor Author

@evillique it would be nice to get this merged if the changes look good to you ? Let me know if you have any more changes that you want me to make.

Copy link
Member

@evillique evillique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably also mention quantiles* function here: docs/en/sql-reference/aggregate-functions/reference/quantiles.md:12

@bharatnc
Copy link
Contributor Author

We should probably also mention quantiles* function here: docs/en/sql-reference/aggregate-functions/reference/quantiles.md:12

sure added.

@evillique evillique merged commit 70e79de into ClickHouse:master Jan 16, 2023
@qoega
Copy link
Member

qoega commented Feb 23, 2023

@bharatnc WDYT you think is it valid to add non integer weights as well? Like when user has % as weight with 1 as cumulative sum.

@bharatnc
Copy link
Contributor Author

bharatnc commented Feb 26, 2023

@bharatnc WDYT you think is it valid to add non integer weights as well? Like when user has % as weight with 1 as cumulative sum.

Hi, I think it'll be okay to support non integer weights depending on the use case. I chose to stick with integer weights for this implementation of quantileInterpolatedWeighted as the quantileExactWeighted also currently only supports integer weights. Perhaps it'll be nice to have both the functions allow non integer weights..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature Pull request with new product feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants