-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-42210][CONNECT][PYTHON] Standardize registered pickled Python UDFs #39860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest renaming CommonInlineUserDefinedFunction to CommonUserDefinedFunction since both registered and inline-defined pickled Python UDFs may share the same proto. CC @HyukjinKwon @zhengruifeng @grundprinzip @hvanhovell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After double thoughts, I may want to do the renaming in a separate PR for an easier review, considering Scala UDFs also depends on the CommonInlineUserDefinedFunction.
b562759 to
55a47ce
Compare
|
can we also remove |
d55b8ff to
da68c8a
Compare
|
Forced push to adjust the PR based on the latest master. |
connector/connect/common/src/main/protobuf/spark/connect/commands.proto
Outdated
Show resolved
Hide resolved
|
Oops, mind rebasing this please @xinrong-meng |
568ed71 to
3ad2b6d
Compare
|
Forced push to base on the latest master. |
|
Merged to branch-3.4 and master, thanks all! |
…UDFs
### What changes were proposed in this pull request?
Standardize registered pickled Python UDFs, specifically, implement `spark.udf.register()`.
### Why are the changes needed?
To reach parity with vanilla PySpark.
### Does this PR introduce _any_ user-facing change?
Yes. `spark.udf.register()` is added as shown below:
```py
>>> spark.udf
<pyspark.sql.connect.udf.UDFRegistration object at 0x7fbca0077dc0>
>>> f = spark.udf.register("f", lambda x: x+1, "int")
>>> f
<function <lambda> at 0x7fbc905e5e50>
>>> spark.sql("SELECT f(id) FROM range(2)").collect()
[Row(f(id)=1), Row(f(id)=2)]
```
### How was this patch tested?
Unit tests.
Closes #39860 from xinrong-meng/connect_registered_udf.
Lead-authored-by: Xinrong Meng <xinrong@apache.org>
Co-authored-by: Xinrong Meng <xinrong.apache@gmail.com>
Signed-off-by: Xinrong Meng <xinrong@apache.org>
(cherry picked from commit e7eb836)
Signed-off-by: Xinrong Meng <xinrong@apache.org>
…UDFs
### What changes were proposed in this pull request?
Standardize registered pickled Python UDFs, specifically, implement `spark.udf.register()`.
### Why are the changes needed?
To reach parity with vanilla PySpark.
### Does this PR introduce _any_ user-facing change?
Yes. `spark.udf.register()` is added as shown below:
```py
>>> spark.udf
<pyspark.sql.connect.udf.UDFRegistration object at 0x7fbca0077dc0>
>>> f = spark.udf.register("f", lambda x: x+1, "int")
>>> f
<function <lambda> at 0x7fbc905e5e50>
>>> spark.sql("SELECT f(id) FROM range(2)").collect()
[Row(f(id)=1), Row(f(id)=2)]
```
### How was this patch tested?
Unit tests.
Closes apache#39860 from xinrong-meng/connect_registered_udf.
Lead-authored-by: Xinrong Meng <xinrong@apache.org>
Co-authored-by: Xinrong Meng <xinrong.apache@gmail.com>
Signed-off-by: Xinrong Meng <xinrong@apache.org>
(cherry picked from commit e7eb836)
Signed-off-by: Xinrong Meng <xinrong@apache.org>
What changes were proposed in this pull request?
Standardize registered pickled Python UDFs, specifically, implement
spark.udf.register().Why are the changes needed?
To reach parity with vanilla PySpark.
Does this PR introduce any user-facing change?
Yes.
spark.udf.register()is added as shown below:How was this patch tested?
Unit tests.
SPARK-41661