You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
With the spark-rapids plugin enabled, mapInPandas doesn't invoke the udf on an empty partition and just returns. This can be problematic when the task is part of a barrier stage and results in deadlock/hang. The non-empty tasks reach the barrier while the empty ones complete without ever executing the barrier code in the udf.
fixes#9480
This PR adds support of launching Map Pandas UDF on empty partitions to align with Spark's behavior.
So far I don't see other types of Pandas UDF will be called for empty partitions.
The test is copied from the example in the linked issue.
---------
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Describe the bug
With the spark-rapids plugin enabled, mapInPandas doesn't invoke the udf on an empty partition and just returns. This can be problematic when the task is part of a barrier stage and results in deadlock/hang. The non-empty tasks reach the barrier while the empty ones complete without ever executing the barrier code in the udf.
I think this is the empty-partition-skipping logic: https://github.com/NVIDIA/spark-rapids/blob/branch-23.12/sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/python/GpuMapInBatchExec.scala#L111-L114
Steps/Code to reproduce bug
spark rapids plugin jar, Spark 3.4.1 (probably any 3.x is similar)
running the same commands and code but without
--conf spark.plugins=com.nvidia.spark.SQLPlugin
, i.e. baseline Spark returns[Row(sum(result)=2)]
Note that
df.repartition(2,"const")
has two partitions with one of them being empty.Expected behavior
This is a bit of a corner case, but the behavior should match baseline Spark and execute the udf even on empty partitions.
Environment details (please complete the following information)
Ubuntu 22.04.2, Spark 3.4.1, python 3.9
Additional context
I think this is the root cause of hangs observed here on a toy example: NVIDIA/spark-rapids-ml#453
Maybe other python operators have a similar issue and should match Spark behavior wrt to empty partitions.
The text was updated successfully, but these errors were encountered: