[SPARK-42210][CONNECT][PYTHON] Standardize registered pickled Python UDFs #39860

xinrong-meng · 2023-02-02T12:09:45Z

What changes were proposed in this pull request?

Standardize registered pickled Python UDFs, specifically, implement spark.udf.register().

Why are the changes needed?

To reach parity with vanilla PySpark.

Does this PR introduce any user-facing change?

Yes. spark.udf.register() is added as shown below:

>>> spark.udf
<pyspark.sql.connect.udf.UDFRegistration object at 0x7fbca0077dc0>
>>> f = spark.udf.register("f", lambda x: x+1, "int")
>>> f
<function <lambda> at 0x7fbc905e5e50>
>>> spark.sql("SELECT f(id) FROM range(2)").collect()
[Row(f(id)=1), Row(f(id)=2)]

How was this patch tested?

Unit tests.

SPARK-41661

xinrong-meng · 2023-02-02T12:18:10Z

connector/connect/common/src/main/protobuf/spark/connect/commands.proto

I would suggest renaming CommonInlineUserDefinedFunction to CommonUserDefinedFunction since both registered and inline-defined pickled Python UDFs may share the same proto. CC @HyukjinKwon @zhengruifeng @grundprinzip @hvanhovell

After double thoughts, I may want to do the renaming in a separate PR for an easier review, considering Scala UDFs also depends on the CommonInlineUserDefinedFunction.

zhengruifeng · 2023-02-03T06:43:47Z

can we also remove funtion_builder.py ?

xinrong-meng · 2023-02-08T03:08:58Z

Forced push to adjust the PR based on the latest master.

connector/connect/common/src/main/protobuf/spark/connect/commands.proto

HyukjinKwon · 2023-02-08T11:47:49Z

Oops, mind rebasing this please @xinrong-meng

xinrong-meng · 2023-02-09T01:44:09Z

Forced push to base on the latest master.

xinrong-meng · 2023-02-09T10:18:43Z

Merged to branch-3.4 and master, thanks all!

…UDFs ### What changes were proposed in this pull request? Standardize registered pickled Python UDFs, specifically, implement `spark.udf.register()`. ### Why are the changes needed? To reach parity with vanilla PySpark. ### Does this PR introduce _any_ user-facing change? Yes. `spark.udf.register()` is added as shown below: ```py >>> spark.udf <pyspark.sql.connect.udf.UDFRegistration object at 0x7fbca0077dc0> >>> f = spark.udf.register("f", lambda x: x+1, "int") >>> f <function <lambda> at 0x7fbc905e5e50> >>> spark.sql("SELECT f(id) FROM range(2)").collect() [Row(f(id)=1), Row(f(id)=2)] ``` ### How was this patch tested? Unit tests. Closes #39860 from xinrong-meng/connect_registered_udf. Lead-authored-by: Xinrong Meng <xinrong@apache.org> Co-authored-by: Xinrong Meng <xinrong.apache@gmail.com> Signed-off-by: Xinrong Meng <xinrong@apache.org> (cherry picked from commit e7eb836) Signed-off-by: Xinrong Meng <xinrong@apache.org>

…UDFs ### What changes were proposed in this pull request? Standardize registered pickled Python UDFs, specifically, implement `spark.udf.register()`. ### Why are the changes needed? To reach parity with vanilla PySpark. ### Does this PR introduce _any_ user-facing change? Yes. `spark.udf.register()` is added as shown below: ```py >>> spark.udf <pyspark.sql.connect.udf.UDFRegistration object at 0x7fbca0077dc0> >>> f = spark.udf.register("f", lambda x: x+1, "int") >>> f <function <lambda> at 0x7fbc905e5e50> >>> spark.sql("SELECT f(id) FROM range(2)").collect() [Row(f(id)=1), Row(f(id)=2)] ``` ### How was this patch tested? Unit tests. Closes apache#39860 from xinrong-meng/connect_registered_udf. Lead-authored-by: Xinrong Meng <xinrong@apache.org> Co-authored-by: Xinrong Meng <xinrong.apache@gmail.com> Signed-off-by: Xinrong Meng <xinrong@apache.org> (cherry picked from commit e7eb836) Signed-off-by: Xinrong Meng <xinrong@apache.org>

github-actions bot added CONNECT CORE PYTHON SQL labels Feb 2, 2023

xinrong-meng commented Feb 2, 2023

View reviewed changes

xinrong-meng force-pushed the connect_registered_udf branch from b562759 to 55a47ce Compare February 3, 2023 03:40

HyukjinKwon approved these changes Feb 3, 2023

View reviewed changes

xinrong-meng force-pushed the connect_registered_udf branch from d55b8ff to da68c8a Compare February 8, 2023 03:07

xinrong-meng changed the title ~~Standardize registered pickled Python UDFs~~ [SPARK-42210][CONNECT][PYTHON] Standardize registered pickled Python UDFs Feb 8, 2023

xinrong-meng marked this pull request as ready for review February 8, 2023 03:10

HyukjinKwon approved these changes Feb 8, 2023

View reviewed changes

zhengruifeng approved these changes Feb 8, 2023

View reviewed changes

zhengruifeng reviewed Feb 8, 2023

View reviewed changes

connector/connect/common/src/main/protobuf/spark/connect/commands.proto Outdated Show resolved Hide resolved

xinrong-meng added 9 commits February 9, 2023 09:25

client

4210737

server

b363124

doc + test

54d54c7

proto

33ecc1b

gen

e3f54f2

import

525d1b3

fix test

27e730b

fix test

9e43249

fix test

3ad2b6d

xinrong-meng force-pushed the connect_registered_udf branch from 568ed71 to 3ad2b6d Compare February 9, 2023 01:42

Share docstring

b4ba548

xinrong-meng closed this in e7eb836 Feb 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-42210][CONNECT][PYTHON] Standardize registered pickled Python UDFs #39860

[SPARK-42210][CONNECT][PYTHON] Standardize registered pickled Python UDFs #39860

Uh oh!

xinrong-meng commented Feb 2, 2023 •

edited

Loading

Uh oh!

xinrong-meng Feb 2, 2023

Uh oh!

xinrong-meng Feb 6, 2023

Uh oh!

zhengruifeng commented Feb 3, 2023

Uh oh!

xinrong-meng commented Feb 8, 2023

Uh oh!

Uh oh!

HyukjinKwon commented Feb 8, 2023

Uh oh!

xinrong-meng commented Feb 9, 2023

Uh oh!

xinrong-meng commented Feb 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-42210][CONNECT][PYTHON] Standardize registered pickled Python UDFs #39860

[SPARK-42210][CONNECT][PYTHON] Standardize registered pickled Python UDFs #39860

Uh oh!

Conversation

xinrong-meng commented Feb 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

xinrong-meng Feb 2, 2023

Choose a reason for hiding this comment

Uh oh!

xinrong-meng Feb 6, 2023

Choose a reason for hiding this comment

Uh oh!

zhengruifeng commented Feb 3, 2023

Uh oh!

xinrong-meng commented Feb 8, 2023

Uh oh!

Uh oh!

HyukjinKwon commented Feb 8, 2023

Uh oh!

xinrong-meng commented Feb 9, 2023

Uh oh!

xinrong-meng commented Feb 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xinrong-meng commented Feb 2, 2023 •

edited

Loading