Skip to content

Conversation

@grundprinzip
Copy link
Contributor

What changes were proposed in this pull request?

This extends the implementation of column aliases in Spark Connect with supporting lists of column names and providing the appropriate implementation for the Python side.

Why are the changes needed?

Compatibility

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT in Python and Scala

@grundprinzip grundprinzip changed the title [SPARK-40809] [CONNECT] [FOLLOW] [SPARK-40809] [CONNECT] [FOLLOW] Support alias() in Python client Nov 11, 2022
@grundprinzip
Copy link
Contributor Author

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Contributor

@amaliujia amaliujia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Per https://spark.apache.org/contributing.html we should follow the following when adding new tests:

def test_case(self):
    # SPARK-12345: a short description of the test
    ...

@cloud-fan
Copy link
Contributor

@HyukjinKwon can you review the python side?

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM some nits

col0 = fun.col("a").alias("martin", metadata={"pii": True})
plan = col0.to_plan(self.session)
self.assertIsNotNone(plan)
self.assertEqual(plan.alias.metadata, '{"pii": true}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about also adding a test for MultiAlias in python?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the test above.

@HyukjinKwon
Copy link
Member

Merged to master.


with self.assertRaises(grpc.RpcError) as exc:
self.connect.range(1, 10).select(col("id").alias("this", "is", "not")).collect()
self.assertIn("Buffer(this, is, not)", str(exc.exception))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grundprinzip The Scala 2.13 daily test has failed in recent times, which seems to be related to this assert:

======================================================================
FAIL [0.742s]: test_alias (pyspark.sql.tests.connect.test_connect_basic.SparkConnectTests)
Testing supported and unsupported alias
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/sql/tests/connect/test_connect_basic.py", line 337, in test_alias
    self.assertIn("Buffer(this, is, not)", str(exc.exception))
AssertionError: 'Buffer(this, is, not)' not found in '<_MultiThreadedRendezvous of RPC that terminated with:\n\tstatus = StatusCode.UNKNOWN\n\tdetails = "[INTERNAL_ERROR] Found the unresolved operator: \'Project [id#24L AS List(this, is, not)]"\n\tdebug_error_string = "UNKNOWN:Error received from peer ipv4:127.0.0.1:15002 {created_time:"2022-11-21T19:25:59.902777537+00:00", grpc_status:2, grpc_message:"[INTERNAL_ERROR] Found the unresolved operator: \\\'Project [id#24L AS List(this, is, not)]"}"\n>'

https://github.com/apache/spark/actions/runs/3517345801/jobs/5895043640

This seems to be related to the difference of Scala version. Can we simplify the assertion? such as self.assertIn("(this, is, not)", str(exc.exception))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a followup at #38752 :-)

HyukjinKwon added a commit that referenced this pull request Nov 22, 2022
…3 test pass

### What changes were proposed in this pull request?

This PR is a followup of #38631 that fixes the test to pass in Scala 2.13 by avoiding using `Buffer` that becomes `List` in Scala 2.13.

### Why are the changes needed?

To fix up the Scala 2.13 build, see https://github.com/apache/spark/actions/runs/3517345801/jobs/5895043640

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?
Manually tested.

Closes #38752 from HyukjinKwon/SPARK-40809-followup.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
### What changes were proposed in this pull request?
This extends the implementation of column aliases in Spark Connect with supporting lists of column names and providing the appropriate implementation for the Python side.

### Why are the changes needed?
Compatibility

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
UT in Python and Scala

Closes apache#38631 from grundprinzip/SPARK-40809-f.

Lead-authored-by: Martin Grund <martin.grund@databricks.com>
Co-authored-by: Martin Grund <grundprinzip@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
…3 test pass

### What changes were proposed in this pull request?

This PR is a followup of apache#38631 that fixes the test to pass in Scala 2.13 by avoiding using `Buffer` that becomes `List` in Scala 2.13.

### Why are the changes needed?

To fix up the Scala 2.13 build, see https://github.com/apache/spark/actions/runs/3517345801/jobs/5895043640

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?
Manually tested.

Closes apache#38752 from HyukjinKwon/SPARK-40809-followup.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 15, 2022
### What changes were proposed in this pull request?
This extends the implementation of column aliases in Spark Connect with supporting lists of column names and providing the appropriate implementation for the Python side.

### Why are the changes needed?
Compatibility

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
UT in Python and Scala

Closes apache#38631 from grundprinzip/SPARK-40809-f.

Lead-authored-by: Martin Grund <martin.grund@databricks.com>
Co-authored-by: Martin Grund <grundprinzip@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 15, 2022
…3 test pass

### What changes were proposed in this pull request?

This PR is a followup of apache#38631 that fixes the test to pass in Scala 2.13 by avoiding using `Buffer` that becomes `List` in Scala 2.13.

### Why are the changes needed?

To fix up the Scala 2.13 build, see https://github.com/apache/spark/actions/runs/3517345801/jobs/5895043640

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?
Manually tested.

Closes apache#38752 from HyukjinKwon/SPARK-40809-followup.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 18, 2022
### What changes were proposed in this pull request?
This extends the implementation of column aliases in Spark Connect with supporting lists of column names and providing the appropriate implementation for the Python side.

### Why are the changes needed?
Compatibility

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
UT in Python and Scala

Closes apache#38631 from grundprinzip/SPARK-40809-f.

Lead-authored-by: Martin Grund <martin.grund@databricks.com>
Co-authored-by: Martin Grund <grundprinzip@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 18, 2022
…3 test pass

### What changes were proposed in this pull request?

This PR is a followup of apache#38631 that fixes the test to pass in Scala 2.13 by avoiding using `Buffer` that becomes `List` in Scala 2.13.

### Why are the changes needed?

To fix up the Scala 2.13 build, see https://github.com/apache/spark/actions/runs/3517345801/jobs/5895043640

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?
Manually tested.

Closes apache#38752 from HyukjinKwon/SPARK-40809-followup.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants