-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-40809] [CONNECT] [FOLLOW] Support alias() in Python client
#38631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
alias() in Python client
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
Show resolved
Hide resolved
|
Can one of the admins verify this patch? |
amaliujia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Per https://spark.apache.org/contributing.html we should follow the following when adding new tests:
def test_case(self):
# SPARK-12345: a short description of the test
...
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
Show resolved
Hide resolved
|
@HyukjinKwon can you review the python side? |
python/pyspark/sql/tests/connect/test_connect_column_expressions.py
Outdated
Show resolved
Hide resolved
...tor/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala
Outdated
Show resolved
Hide resolved
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM some nits
| col0 = fun.col("a").alias("martin", metadata={"pii": True}) | ||
| plan = col0.to_plan(self.session) | ||
| self.assertIsNotNone(plan) | ||
| self.assertEqual(plan.alias.metadata, '{"pii": true}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about also adding a test for MultiAlias in python?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the test above.
…ns.py Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
…planner/SparkConnectProtoSuite.scala Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
|
Merged to master. |
|
|
||
| with self.assertRaises(grpc.RpcError) as exc: | ||
| self.connect.range(1, 10).select(col("id").alias("this", "is", "not")).collect() | ||
| self.assertIn("Buffer(this, is, not)", str(exc.exception)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@grundprinzip The Scala 2.13 daily test has failed in recent times, which seems to be related to this assert:
======================================================================
FAIL [0.742s]: test_alias (pyspark.sql.tests.connect.test_connect_basic.SparkConnectTests)
Testing supported and unsupported alias
----------------------------------------------------------------------
Traceback (most recent call last):
File "/__w/spark/spark/python/pyspark/sql/tests/connect/test_connect_basic.py", line 337, in test_alias
self.assertIn("Buffer(this, is, not)", str(exc.exception))
AssertionError: 'Buffer(this, is, not)' not found in '<_MultiThreadedRendezvous of RPC that terminated with:\n\tstatus = StatusCode.UNKNOWN\n\tdetails = "[INTERNAL_ERROR] Found the unresolved operator: \'Project [id#24L AS List(this, is, not)]"\n\tdebug_error_string = "UNKNOWN:Error received from peer ipv4:127.0.0.1:15002 {created_time:"2022-11-21T19:25:59.902777537+00:00", grpc_status:2, grpc_message:"[INTERNAL_ERROR] Found the unresolved operator: \\\'Project [id#24L AS List(this, is, not)]"}"\n>'
https://github.com/apache/spark/actions/runs/3517345801/jobs/5895043640
This seems to be related to the difference of Scala version. Can we simplify the assertion? such as self.assertIn("(this, is, not)", str(exc.exception))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a followup at #38752 :-)
…3 test pass ### What changes were proposed in this pull request? This PR is a followup of #38631 that fixes the test to pass in Scala 2.13 by avoiding using `Buffer` that becomes `List` in Scala 2.13. ### Why are the changes needed? To fix up the Scala 2.13 build, see https://github.com/apache/spark/actions/runs/3517345801/jobs/5895043640 ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually tested. Closes #38752 from HyukjinKwon/SPARK-40809-followup. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request? This extends the implementation of column aliases in Spark Connect with supporting lists of column names and providing the appropriate implementation for the Python side. ### Why are the changes needed? Compatibility ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT in Python and Scala Closes apache#38631 from grundprinzip/SPARK-40809-f. Lead-authored-by: Martin Grund <martin.grund@databricks.com> Co-authored-by: Martin Grund <grundprinzip@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…3 test pass ### What changes were proposed in this pull request? This PR is a followup of apache#38631 that fixes the test to pass in Scala 2.13 by avoiding using `Buffer` that becomes `List` in Scala 2.13. ### Why are the changes needed? To fix up the Scala 2.13 build, see https://github.com/apache/spark/actions/runs/3517345801/jobs/5895043640 ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually tested. Closes apache#38752 from HyukjinKwon/SPARK-40809-followup. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request? This extends the implementation of column aliases in Spark Connect with supporting lists of column names and providing the appropriate implementation for the Python side. ### Why are the changes needed? Compatibility ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT in Python and Scala Closes apache#38631 from grundprinzip/SPARK-40809-f. Lead-authored-by: Martin Grund <martin.grund@databricks.com> Co-authored-by: Martin Grund <grundprinzip@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…3 test pass ### What changes were proposed in this pull request? This PR is a followup of apache#38631 that fixes the test to pass in Scala 2.13 by avoiding using `Buffer` that becomes `List` in Scala 2.13. ### Why are the changes needed? To fix up the Scala 2.13 build, see https://github.com/apache/spark/actions/runs/3517345801/jobs/5895043640 ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually tested. Closes apache#38752 from HyukjinKwon/SPARK-40809-followup. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request? This extends the implementation of column aliases in Spark Connect with supporting lists of column names and providing the appropriate implementation for the Python side. ### Why are the changes needed? Compatibility ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT in Python and Scala Closes apache#38631 from grundprinzip/SPARK-40809-f. Lead-authored-by: Martin Grund <martin.grund@databricks.com> Co-authored-by: Martin Grund <grundprinzip@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…3 test pass ### What changes were proposed in this pull request? This PR is a followup of apache#38631 that fixes the test to pass in Scala 2.13 by avoiding using `Buffer` that becomes `List` in Scala 2.13. ### Why are the changes needed? To fix up the Scala 2.13 build, see https://github.com/apache/spark/actions/runs/3517345801/jobs/5895043640 ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually tested. Closes apache#38752 from HyukjinKwon/SPARK-40809-followup. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
This extends the implementation of column aliases in Spark Connect with supporting lists of column names and providing the appropriate implementation for the Python side.
Why are the changes needed?
Compatibility
Does this PR introduce any user-facing change?
No
How was this patch tested?
UT in Python and Scala