[SPARK-52060][SQL] Make `OneRowRelationExec` node #50849

richardc-db · 2025-05-09T18:45:34Z

What changes were proposed in this pull request?

creates a new OneRowRelationExec node, which is more or less a copy of the RDDScanExec node.

We want a dedicated node because this helps make it more clear when a one row relation, i.e. for patterns like SELECT version() is used.

Why are the changes needed?

this makes it more clear in the code that a one row relation is used and allows us to avoid checking the hard coded "OneRowRelation" string when pattern matching.

Does this PR introduce any user-facing change?

yes, the plan will now be OneRowRelationExec rather than RDDScanExec. The plan string should be the same, however.

How was this patch tested?

added UTs

Was this patch authored or co-authored using generative AI tooling?

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala

cloud-fan · 2025-05-19T07:03:07Z

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala

+  }
+
+  override def simpleString(maxFields: Int): String = {
+    s"$nodeName${truncatedString(output, "[", ",", "]", maxFields)}"


How is this different from the default implementation?

the default implementation returns Scan OneRowRelation, while the existing implementation (using RDDScan) returns Scan OneRowRelation[]. I figured we shouldn't change this in the off chance that someone is relying on it.

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala

richardc-db · 2025-05-20T05:42:58Z

there are failures like

===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.errors.QueryExecutionErrorsSuite, threads: rpc-boss-1245-1 (daemon=true) =====


[info] org.apache.spark.sql.errors.QueryExecutionErrorsSuite *** ABORTED *** (37 milliseconds)
[info]   java.lang.IllegalStateException: Shutdown hooks cannot be modified during shutdown.
[info]   at org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:212)

which i also see in other PRs such as here, so i think the failures are unrelated

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala

cloud-fan · 2025-05-20T09:45:55Z

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala

+
+  private val emptyRow: InternalRow = InternalRow.empty
+
+  private val rdd = session.sparkContext.parallelize(Seq(emptyRow), 1)


I think we can do this

private val rdd = { val proj = UnsafeProjection.create(schema) val emptyRow = proj(InternalRow.empty) session.sparkContext.parallelize(Seq(emptyRow), 1) }

then def doExecute() can just return this RDD

hmm, sure i implemented this - but because we still want to increment the number of output rows at the time when the row is actually processed, i did not end up simply returning rdd from doExecute()... lmk if you think there a better way.

cloud-fan · 2025-05-20T09:48:12Z

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala

+
+  override def inputRDD: RDD[InternalRow] = rdd
+
+  override protected val createUnsafeProjection: Boolean = true


If we do https://github.com/apache/spark/pull/50849/files#r2097521541 , then this can be false.

cloud-fan · 2025-05-22T15:08:40Z

@richardc-db can you do a rebase to re-trigger the CI? We fixed an OOM test issue recently.

richardc-db · 2025-05-23T00:56:53Z

hmm, @cloud-fan, seems like this causes a test to fail with org.apache.spark.sql.CacheTableInKryoSuite. The error is

java.io.IOException: com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Class is not registered: org.apache.spark.sql.catalyst.expressions.UnsafeRow[]

I'm guessing this is because the inputRDD has unsafe rows? The test passes after undoing the change described here... do you think we should undo this change and go back to adding an unsafe projection in doExecute, like so:

protected override def doExecute(): RDD[InternalRow] = {
    val numOutputRows = longMetric("numOutputRows")
    rdd.mapPartitionsWithIndexInternal { (_, iter) =>
      val proj = UnsafeProjection.create(schema)
      iter.map { r =>
        numOutputRows += 1
        proj(r)
      }
    }
  }

and setting createUnsafeProjection: Boolean = true

cloud-fan · 2025-05-23T15:22:32Z

@richardc-db let's change it back, didn't realize this serde issue...

richardc-db · 2025-05-23T21:52:06Z

@cloud-fan sounds good! I switched it back and seems like tests pass now

cloud-fan · 2025-05-26T13:51:56Z

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala

+
+  override val output: Seq[Attribute] = Nil
+
+  private val rdd: RDD[InternalRow] = session.sparkContext.parallelize(Seq(InternalRow.empty), 1)


thinking about it more, I think we can avoid serializing any row:

session.sparkContext.parallelize(Nil, 1).mapPartitionsInternal { _ => val proj = UnsafeProjection.create(Seq.empty[Expression]) Iterator(proj.apply(InternalRow.empty)) }

Now the unsafe row is generated on the fly at the worker side, no serialization is needed.

done, thanks for the help! ended up doing

private val rdd: RDD[InternalRow] = { val numOutputRows = longMetric("numOutputRows") session .sparkContext .parallelize(Seq(InternalRow()), 1) .mapPartitionsInternal { _ => val proj = UnsafeProjection.create(Seq.empty[Expression]) Iterator(proj.apply(InternalRow.empty)).map { r => numOutputRows += 1 r } } }

to ensure the metrics are filled properly.

cloud-fan

LGTM except for one idea to improve the perf: https://github.com/apache/spark/pull/50849/files#r2107403483

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala

…gRDD.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala

…gRDD.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala

…gRDD.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala

…gRDD.scala

cloud-fan · 2025-05-28T07:27:06Z

thanks, merging to master!

richardc-db changed the title ~~[SQL] Make OneRowRelationExec node~~ [SPARK-52060][SQL] Make OneRowRelationExec node May 9, 2025

github-actions bot added the SQL label May 9, 2025

cloud-fan reviewed May 19, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala Outdated Show resolved Hide resolved

cloud-fan reviewed May 19, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala Outdated Show resolved Hide resolved

cloud-fan reviewed May 19, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala Outdated Show resolved Hide resolved

richardc-db requested a review from cloud-fan May 19, 2025 22:19

cloud-fan reviewed May 20, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala Show resolved Hide resolved

cloud-fan reviewed May 20, 2025

View reviewed changes

richardc-db requested a review from cloud-fan May 20, 2025 20:30

cloud-fan approved these changes May 22, 2025

View reviewed changes

richardc-db added 6 commits May 22, 2025 08:14

init

14fb621

comments

d8a6ea6

output unsafe row

d784470

cleanup

1d947fc

cleanup

fd016a6

comments

eef16fe

richardc-db force-pushed the make_one_row_relation_node branch from c993f51 to eef16fe Compare May 22, 2025 15:14

switch back

1b3bdf8

cloud-fan reviewed May 26, 2025

View reviewed changes

cloud-fan approved these changes May 26, 2025

View reviewed changes

richardc-db added 2 commits May 26, 2025 16:17

commetns

0b3db01

flip create unsafe projetion

922624e

cloud-fan reviewed May 27, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala Outdated Show resolved Hide resolved

Update sql/core/src/main/scala/org/apache/spark/sql/execution/Existin…

b95fc44

…gRDD.scala

cloud-fan reviewed May 27, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala Outdated Show resolved Hide resolved

Update sql/core/src/main/scala/org/apache/spark/sql/execution/Existin…

3325628

…gRDD.scala

cloud-fan reviewed May 27, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala Outdated Show resolved Hide resolved

Update sql/core/src/main/scala/org/apache/spark/sql/execution/Existin…

d1e9a48

…gRDD.scala

cloud-fan reviewed May 28, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala Outdated Show resolved Hide resolved

Update sql/core/src/main/scala/org/apache/spark/sql/execution/Existin…

9f5527e

…gRDD.scala

cloud-fan closed this in f423885 May 28, 2025


		private val emptyRow: InternalRow = InternalRow.empty

		private val rdd = session.sparkContext.parallelize(Seq(emptyRow), 1)


		override def inputRDD: RDD[InternalRow] = rdd

		override protected val createUnsafeProjection: Boolean = true


		override val output: Seq[Attribute] = Nil

		private val rdd: RDD[InternalRow] = session.sparkContext.parallelize(Seq(InternalRow.empty), 1)

[SPARK-52060][SQL] Make OneRowRelationExec node #50849

[SPARK-52060][SQL] Make OneRowRelationExec node #50849

Uh oh!

Conversation

richardc-db commented May 9, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

richardc-db commented May 20, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented May 22, 2025

Uh oh!

richardc-db commented May 23, 2025

Uh oh!

cloud-fan commented May 23, 2025

Uh oh!

richardc-db commented May 23, 2025

Uh oh!

cloud-fan May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cloud-fan commented May 28, 2025

Uh oh!

Uh oh!

[SPARK-52060][SQL] Make `OneRowRelationExec` node #50849

[SPARK-52060][SQL] Make `OneRowRelationExec` node #50849

cloud-fan May 26, 2025 •

edited

Loading