Skip to content

Conversation

@Yicong-Huang
Copy link
Contributor

What changes were proposed in this pull request?

  • spark.python.profile: Add SQL_GROUPED_AGG_ARROW_ITER_UDF to the profiler warning list in udf.py so that when spark.python.profile is enabled, users will see appropriate warnings consistent with other iterator-based UDFs.
  • spark.sql.pyspark.udf.profiler: No changes needed. This UDF type already works correctly because it returns scalar (not iterator), so it uses the non-iterator profiler branch in wrap_perf_profiler and wrap_memory_profiler.

Why are the changes needed?

To make profilers support for SQL_GROUPED_AGG_ARROW_ITER_UDF consistent with other UDFs.

Does this PR introduce any user-facing change?

Yes. When users enable spark.python.profile with SQL_GROUPED_AGG_ARROW_ITER_UDF, they will now see a warning message consistent with other iterator-based UDFs.

How was this patch tested?

Added a test case test_perf_profiler_arrow_udf_grouped_agg_iter to verify that spark.sql.pyspark.udf.profiler works correctly with this UDF type. Also verified that the spark.python.profile profiler warning is triggered correctly in test_unsupported.

Was this patch authored or co-authored using generative AI tooling?

No.

@HyukjinKwon
Copy link
Member

Merged to master.

xu20160924 pushed a commit to xu20160924/spark that referenced this pull request Dec 9, 2025
…regate UDF

### What changes were proposed in this pull request?

- `spark.python.profile`: Add `SQL_GROUPED_AGG_ARROW_ITER_UDF` to the profiler warning list in `udf.py` so that when `spark.python.profile` is enabled, users will see appropriate warnings consistent with other iterator-based UDFs.
- `spark.sql.pyspark.udf.profiler`: No changes needed. This UDF type already works correctly because it returns scalar (not iterator), so it uses the non-iterator profiler branch in `wrap_perf_profiler` and `wrap_memory_profiler`.

### Why are the changes needed?

To make profilers support for `SQL_GROUPED_AGG_ARROW_ITER_UDF` consistent with other UDFs.

### Does this PR introduce _any_ user-facing change?

Yes. When users enable `spark.python.profile` with `SQL_GROUPED_AGG_ARROW_ITER_UDF`, they will now see a warning message consistent with other iterator-based UDFs.

### How was this patch tested?

Added a test case `test_perf_profiler_arrow_udf_grouped_agg_iter` to verify that `spark.sql.pyspark.udf.profiler` works correctly with this UDF type. Also verified that the `spark.python.profile` profiler warning is triggered correctly in `test_unsupported`.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#53374 from Yicong-Huang/SPARK-54631/feat/add-profiler-support-for-arrow-grouped-agg-iter-udf.

Authored-by: Yicong-Huang <17627829+Yicong-Huang@users.noreply.github.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants