Skip to content

Conversation

@xinrong-meng
Copy link
Member

@xinrong-meng xinrong-meng commented Feb 2, 2023

What changes were proposed in this pull request?

Standardize registered pickled Python UDFs, specifically, implement spark.udf.register().

Why are the changes needed?

To reach parity with vanilla PySpark.

Does this PR introduce any user-facing change?

Yes. spark.udf.register() is added as shown below:

>>> spark.udf
<pyspark.sql.connect.udf.UDFRegistration object at 0x7fbca0077dc0>
>>> f = spark.udf.register("f", lambda x: x+1, "int")
>>> f
<function <lambda> at 0x7fbc905e5e50>
>>> spark.sql("SELECT f(id) FROM range(2)").collect()
[Row(f(id)=1), Row(f(id)=2)] 

How was this patch tested?

Unit tests.

SPARK-41661

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest renaming CommonInlineUserDefinedFunction to CommonUserDefinedFunction since both registered and inline-defined pickled Python UDFs may share the same proto. CC @HyukjinKwon @zhengruifeng @grundprinzip @hvanhovell

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After double thoughts, I may want to do the renaming in a separate PR for an easier review, considering Scala UDFs also depends on the CommonInlineUserDefinedFunction.

@xinrong-meng xinrong-meng force-pushed the connect_registered_udf branch from b562759 to 55a47ce Compare February 3, 2023 03:40
@zhengruifeng
Copy link
Contributor

can we also remove funtion_builder.py ?

@xinrong-meng xinrong-meng force-pushed the connect_registered_udf branch from d55b8ff to da68c8a Compare February 8, 2023 03:07
@xinrong-meng
Copy link
Member Author

Forced push to adjust the PR based on the latest master.

@xinrong-meng xinrong-meng changed the title Standardize registered pickled Python UDFs [SPARK-42210][CONNECT][PYTHON] Standardize registered pickled Python UDFs Feb 8, 2023
@xinrong-meng xinrong-meng marked this pull request as ready for review February 8, 2023 03:10
@HyukjinKwon
Copy link
Member

Oops, mind rebasing this please @xinrong-meng

@xinrong-meng xinrong-meng force-pushed the connect_registered_udf branch from 568ed71 to 3ad2b6d Compare February 9, 2023 01:42
@xinrong-meng
Copy link
Member Author

Forced push to base on the latest master.

@xinrong-meng
Copy link
Member Author

Merged to branch-3.4 and master, thanks all!

xinrong-meng added a commit that referenced this pull request Feb 9, 2023
…UDFs

### What changes were proposed in this pull request?
Standardize registered pickled Python UDFs, specifically, implement `spark.udf.register()`.

### Why are the changes needed?
To reach parity with vanilla PySpark.

### Does this PR introduce _any_ user-facing change?
Yes. `spark.udf.register()` is added as shown below:

```py
>>> spark.udf
<pyspark.sql.connect.udf.UDFRegistration object at 0x7fbca0077dc0>
>>> f = spark.udf.register("f", lambda x: x+1, "int")
>>> f
<function <lambda> at 0x7fbc905e5e50>
>>> spark.sql("SELECT f(id) FROM range(2)").collect()
[Row(f(id)=1), Row(f(id)=2)]
```

### How was this patch tested?
Unit tests.

Closes #39860 from xinrong-meng/connect_registered_udf.

Lead-authored-by: Xinrong Meng <xinrong@apache.org>
Co-authored-by: Xinrong Meng <xinrong.apache@gmail.com>
Signed-off-by: Xinrong Meng <xinrong@apache.org>
(cherry picked from commit e7eb836)
Signed-off-by: Xinrong Meng <xinrong@apache.org>
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
…UDFs

### What changes were proposed in this pull request?
Standardize registered pickled Python UDFs, specifically, implement `spark.udf.register()`.

### Why are the changes needed?
To reach parity with vanilla PySpark.

### Does this PR introduce _any_ user-facing change?
Yes. `spark.udf.register()` is added as shown below:

```py
>>> spark.udf
<pyspark.sql.connect.udf.UDFRegistration object at 0x7fbca0077dc0>
>>> f = spark.udf.register("f", lambda x: x+1, "int")
>>> f
<function <lambda> at 0x7fbc905e5e50>
>>> spark.sql("SELECT f(id) FROM range(2)").collect()
[Row(f(id)=1), Row(f(id)=2)]
```

### How was this patch tested?
Unit tests.

Closes apache#39860 from xinrong-meng/connect_registered_udf.

Lead-authored-by: Xinrong Meng <xinrong@apache.org>
Co-authored-by: Xinrong Meng <xinrong.apache@gmail.com>
Signed-off-by: Xinrong Meng <xinrong@apache.org>
(cherry picked from commit e7eb836)
Signed-off-by: Xinrong Meng <xinrong@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants