Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add min, max Spark aggregate functions #9868

Closed

Conversation

zhli1142015
Copy link
Contributor

@zhli1142015 zhli1142015 commented May 20, 2024

There are two semantic differences between Presto and Spark.

  1. Nested NULLs are compared as values in Spark and as "unknown value" in
    Presto.
  2. The timestamp type represents a time instant in microsecond precision in
    Spark, but millisecond precision in Presto.

Therefore, we need to implement min and max functions for Spark. In this PR,

  1. Move Presto min and max aggregation function implements to lib folder.
  2. Add getMinFunctionFactory and getMaxFunctionFactory which allow callers
    to register max & min functions with different behaviors.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 20, 2024
Copy link

netlify bot commented May 20, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit c9ff748
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66a071afd5ad300008b6ba4e

@zhli1142015 zhli1142015 changed the title Add min, max aggregate Spark functions Add min, max Spark aggregate functions May 20, 2024
@zhli1142015
Copy link
Contributor Author

cc @rui-mo and @PHILO-HE , thanks.

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good overall.

@zhli1142015 zhli1142015 requested a review from rui-mo May 20, 2024 07:41
Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good to me.

velox/docs/functions/spark/aggregate.rst Outdated Show resolved Hide resolved
velox/docs/functions/spark/aggregate.rst Outdated Show resolved Hide resolved
@zhli1142015
Copy link
Contributor Author

@mbasmanova and @Yuhta , could you help to review? Thanks.

@zhli1142015 zhli1142015 force-pushed the min_max_timestamp branch 3 times, most recently from d2676f3 to dfa9e24 Compare May 29, 2024 00:23
@zhli1142015
Copy link
Contributor Author

@mbasmanova and @Yuhta , could you help to review? Thanks.

Kindly ping, could you help to review? Thanks.

@zhli1142015
Copy link
Contributor Author

@jinchengchenghh , do you have more comments? Thanks.

@zhli1142015 zhli1142015 force-pushed the min_max_timestamp branch 2 times, most recently from 3e0d02d to 954f780 Compare June 3, 2024 23:50
@zhli1142015
Copy link
Contributor Author

@mbasmanova , could you please help review this PR? Thanks.

@zhli1142015 zhli1142015 force-pushed the min_max_timestamp branch 3 times, most recently from d276b70 to 8f239a3 Compare June 6, 2024 06:33
@zhli1142015
Copy link
Contributor Author

@rui-mo , is it ok to merge this PR? Thanks.

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Added several nits.

@@ -290,6 +290,20 @@ struct Timestamp {
/// successful.
static int64_t calendarUtcToEpoch(const std::tm& tm);

/// Truncates a Timestamp value to the specified precision.
static Timestamp truncate(Timestamp ts, TimestampPrecision precision) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Thank you.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhli1142015 Thanks.

@mbasmanova mbasmanova added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Jul 22, 2024
@facebook-github-bot
Copy link
Contributor

@kevinwilfong has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@rui-mo
Copy link
Collaborator

rui-mo commented Jul 23, 2024

@zhli1142015 Would you resolve the conflicts? Thanks.

@zhli1142015
Copy link
Contributor Author

@zhli1142015 Would you resolve the conflicts? Thanks.

Done, thanks.

@facebook-github-bot
Copy link
Contributor

@kevinwilfong has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@kevinwilfong
Copy link
Contributor

Sorry but could you rebase again, I'm still seeing conflicts in velox/functions/lib/aggregates/CMakeLists.txt

@facebook-github-bot
Copy link
Contributor

@kevinwilfong has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@kevinwilfong merged this pull request in e5671c0.

weiting-chen pushed a commit to weiting-chen/velox that referenced this pull request Nov 12, 2024
…ox/functions/prestosql/aggregates/MinMaxAggregates.cpp

Add min, max Spark aggregate functions (facebookincubator#9868)

Summary:
There are two semantic differences between Presto and Spark.
1. Nested NULLs are compared as values in Spark and as "unknown value" in
Presto.
2. The timestamp type represents a time instant in microsecond precision in
Spark, but millisecond precision in Presto.

Therefore, we need to implement min and max functions for Spark. In this PR,
1. Move Presto `min` and `max` aggregation function implements to lib folder.
2. Add `getMinFunctionFactory` and `getMaxFunctionFactory`  which allow callers
to register max & min functions with different behaviors.

Pull Request resolved: facebookincubator#9868

Reviewed By: mbasmanova

Differential Revision: D60051468

Pulled By: kevinwilfong

fbshipit-source-id: 1f056420d6909174a35d336e4e1b413a87ef7665
weiting-chen added a commit to oap-project/velox that referenced this pull request Nov 14, 2024
…ox/functions/prestosql/aggregates/MinMaxAggregates.cpp (#503)

Add min, max Spark aggregate functions (facebookincubator#9868)

Summary:
There are two semantic differences between Presto and Spark.
1. Nested NULLs are compared as values in Spark and as "unknown value" in
Presto.
2. The timestamp type represents a time instant in microsecond precision in
Spark, but millisecond precision in Presto.

Therefore, we need to implement min and max functions for Spark. In this PR,
1. Move Presto `min` and `max` aggregation function implements to lib folder.
2. Add `getMinFunctionFactory` and `getMaxFunctionFactory`  which allow callers
to register max & min functions with different behaviors.

Pull Request resolved: facebookincubator#9868

Reviewed By: mbasmanova

Differential Revision: D60051468

Pulled By: kevinwilfong

fbshipit-source-id: 1f056420d6909174a35d336e4e1b413a87ef7665

Co-authored-by: zhli1142015 <zhli@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants