[SPARK-39925][SQL] Add array_sort(column, comparator) overload to DataFrame operations #37361

brandondahler · 2022-08-01T12:58:51Z

What changes were proposed in this pull request?

Adding a new array_sort overload to org.apache.spark.sql.functions that matches the new overload defined in SPARK-29020 and added via #25728.

Why are the changes needed?

Adds access to the new overload for users of the DataFrame API so that they don't need to use the expr escape hatch.

Does this PR introduce any user-facing change?

Yes, now allows users to optionally provide a comparator function to the array_sort, which opens up the ability to sort descending as well as sort items that aren't naturally orderable.

Example:

Old:

df.selectExpr("array_sort(a, (x, y) -> cardinality(x) - cardinality(y))");

Added:

df.select(array_sort(col("a"), (x, y) => size(x) - size(y)));

How was this patch tested?

Unit tests updated to validate that the overload matches the expression's behavior.

AmplabJenkins · 2022-08-01T20:22:58Z

Can one of the admins verify this patch?

HyukjinKwon

LGTM. are you also interested in adding this in SparkR and PySpark? We can do that in a separate PR.

brandondahler · 2022-08-18T00:24:20Z

LGTM. are you also interested in adding this in SparkR and PySpark? We can do that in a separate PR.

I do think they should be added (I checked that they aren't already there), but I don't personally have availability to do so at this time.

HyukjinKwon · 2022-08-18T01:57:33Z

Oops, it slipped through my fingers. Mind retriggering https://github.com/brandondahler/spark/runs/7585897593?

HyukjinKwon · 2022-08-18T01:59:02Z

cc @zero323, @itholic, @zhengruifeng FYI (since we need to add PySpark and SparkR ones)

brandondahler · 2022-08-18T02:21:44Z

Clicked re-run all jobs on that linked run, let me know if there was something else you meant for me to do

zhengruifeng · 2022-08-18T02:54:53Z

since pyspark/sql/tests/test_functions.py will check the parity between PySpark and SQL, so I think we may need to add array_sort into expected_missing_in_py

otherwise, LGTM

zero323 · 2022-08-18T18:28:23Z

It seems like it has to be re-synced with upstream, to address black failures.

…aFrame operations

brandondahler · 2022-08-20T23:06:37Z

Rebased on lastest master changes

HyukjinKwon · 2022-08-21T09:21:04Z

Merged to master.

github-actions bot added the SQL label Aug 1, 2022

HyukjinKwon approved these changes Aug 2, 2022

View reviewed changes

[SPARK-39925][SQL] Add array_sort(column, comparator) overload to Dat…

72d799b

…aFrame operations

brandondahler force-pushed the features/ArraySortOverload branch from 49743ea to 72d799b Compare August 20, 2022 23:06

HyukjinKwon closed this in 1e26abf Aug 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-39925][SQL] Add array_sort(column, comparator) overload to DataFrame operations #37361

[SPARK-39925][SQL] Add array_sort(column, comparator) overload to DataFrame operations #37361

Uh oh!

brandondahler commented Aug 1, 2022 •

edited

Loading

Uh oh!

AmplabJenkins commented Aug 1, 2022

Uh oh!

HyukjinKwon left a comment

Uh oh!

brandondahler commented Aug 18, 2022

Uh oh!

HyukjinKwon commented Aug 18, 2022

Uh oh!

HyukjinKwon commented Aug 18, 2022

Uh oh!

brandondahler commented Aug 18, 2022

Uh oh!

zhengruifeng commented Aug 18, 2022 •

edited

Loading

Uh oh!

zero323 commented Aug 18, 2022

Uh oh!

brandondahler commented Aug 20, 2022

Uh oh!

HyukjinKwon commented Aug 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-39925][SQL] Add array_sort(column, comparator) overload to DataFrame operations #37361

[SPARK-39925][SQL] Add array_sort(column, comparator) overload to DataFrame operations #37361

Uh oh!

Conversation

brandondahler commented Aug 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

Example:

How was this patch tested?

Uh oh!

AmplabJenkins commented Aug 1, 2022

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

brandondahler commented Aug 18, 2022

Uh oh!

HyukjinKwon commented Aug 18, 2022

Uh oh!

HyukjinKwon commented Aug 18, 2022

Uh oh!

brandondahler commented Aug 18, 2022

Uh oh!

zhengruifeng commented Aug 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zero323 commented Aug 18, 2022

Uh oh!

brandondahler commented Aug 20, 2022

Uh oh!

HyukjinKwon commented Aug 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

brandondahler commented Aug 1, 2022 •

edited

Loading

zhengruifeng commented Aug 18, 2022 •

edited

Loading