Skip to content

Commit

Permalink
[SPARK-18079] [SQL] CollectLimitExec.executeToIterator should perform…
Browse files Browse the repository at this point in the history
… per-partition limits

We have an internal product that needs this. See upstream PR [1].

[1] apache#15614

Co-authored-by: Patrick Woody <pwoody@palantir.com>
Co-authored-by: Josh Casale <jcasale@palantir.com>
Co-authored-by: Will Raschkowski <wraschkowski@palantir.com>
  • Loading branch information
3 people committed Mar 4, 2021
1 parent 4ba513b commit ce81128
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 0 deletions.
1 change: 1 addition & 0 deletions FORK.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Difference with upstream
* [SPARK-18079](https://issues.apache.org/jira/browse/SPARK-18079) - CollectLimitExec.executeToIterator should perform per-partition limits
* [SPARK-20952](https://issues.apache.org/jira/browse/SPARK-20952) - ParquetFileFormat should forward TaskContext to its forkjoinpool
* [SPARK-26626](https://issues.apache.org/jira/browse/SPARK-26626) - Limited the maximum size of repeatedly substituted aliases
* [SPARK-25200](https://issues.apache.org/jira/browse/SPARK-25200) - Allow setting HADOOP_CONF_DIR as a spark config
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,12 @@ case class CollectLimitExec(limit: Int, child: SparkPlan) extends LimitExec {
override def output: Seq[Attribute] = child.output
override def outputPartitioning: Partitioning = SinglePartition
override def executeCollect(): Array[InternalRow] = child.executeTake(limit)

// TODO(palantir): Reopen upstream PR for SPARK-18079; we need this for an internal product (DP)
override def executeToIterator(): Iterator[InternalRow] = {
LocalLimitExec(limit, child).executeToIterator().take(limit)
}

private val serializer: Serializer = new UnsafeRowSerializer(child.output.size)
private lazy val writeMetrics =
SQLShuffleWriteMetricsReporter.createShuffleWriteMetrics(sparkContext)
Expand Down

0 comments on commit ce81128

Please sign in to comment.