Skip to content

Conversation

@peter-toth
Copy link
Contributor

@peter-toth peter-toth commented Jan 19, 2026

What changes were proposed in this pull request?

Currently KeyGroupedPartitioning always groups partitions by key regardless if grouping is actually needed or not. This beahaviour decreases parallelism and can lead to slower performance.

This PR disables parition grouping of a scan with KeyGroupedPartitioning output partitioning if:

  • a shuffle is inserted above a scan, which means that grouping is not needed for the parent,
  • and grouping is not needed for the intermediate nodes either.

We can't disable partition grouping of a scan in a main query if it contributes the ouput partitioning of the query result because we don't know whether the query is cached/checkpointed and how the output of the query will be used later. The output must keep KeyGroupedPartitioning semantics in this case.
But we can disable partition grouping in subqueries when grouping is not needed for anything in the subquery plan. This is actually necessary to make sure broadcast exchange reuse happens correctly during dynamic partition pruning.

Why are the changes needed?

Improve performance.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New UT added.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Jan 19, 2026
@github-actions
Copy link

JIRA Issue Information

=== Improvement SPARK-55092 ===
Summary: KeyGroupedPartitionig shouldn't group partitions when not needed
Assignee: None
Status: Open
Affected: ["4.2.0"]


This comment was automatically generated by GitHub Actions

@peter-toth peter-toth marked this pull request as draft January 19, 2026 18:53
@peter-toth peter-toth force-pushed the SPARK-55092-kgp-do-not-group-partitions-when-not-needed branch from 04eed9a to a0e1d99 Compare January 19, 2026 19:06
@peter-toth peter-toth changed the title [SPARK-55092][SQL] KeyGroupedPartitioning don't group partitions when not needed [WIP][SPARK-55092][SQL] KeyGroupedPartitioning don't group partitions when not needed Jan 19, 2026
@peter-toth peter-toth force-pushed the SPARK-55092-kgp-do-not-group-partitions-when-not-needed branch from a0e1d99 to 1dc157a Compare January 20, 2026 11:36
@peter-toth peter-toth changed the title [WIP][SPARK-55092][SQL] KeyGroupedPartitioning don't group partitions when not needed [WIP][SPARK-55092][SQL] Disable partition grouping in KeyGroupedPartitioning when not needed Jan 20, 2026
@peter-toth peter-toth changed the title [WIP][SPARK-55092][SQL] Disable partition grouping in KeyGroupedPartitioning when not needed [SPARK-55092][SQL] Disable partition grouping in KeyGroupedPartitioning when not needed Jan 20, 2026
@peter-toth peter-toth marked this pull request as ready for review January 20, 2026 12:02
@peter-toth peter-toth force-pushed the SPARK-55092-kgp-do-not-group-partitions-when-not-needed branch from 1dc157a to f4fbeed Compare January 20, 2026 12:05
@peter-toth peter-toth marked this pull request as draft January 20, 2026 14:28
@peter-toth peter-toth changed the title [SPARK-55092][SQL] Disable partition grouping in KeyGroupedPartitioning when not needed [WIP][SPARK-55092][SQL] Disable partition grouping in KeyGroupedPartitioning when not needed Jan 20, 2026
@peter-toth peter-toth changed the title [WIP][SPARK-55092][SQL] Disable partition grouping in KeyGroupedPartitioning when not needed [SPARK-55092][SQL] Disable partition grouping in KeyGroupedPartitioning when not needed Jan 20, 2026
@peter-toth peter-toth marked this pull request as ready for review January 20, 2026 18:17
@peter-toth
Copy link
Contributor Author

cc @szehon-ho , @sunchao, @viirya, @dongjoon-hyun

@dongjoon-hyun
Copy link
Member

Thank you for pinging me, @peter-toth .

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although we need to handle copyFromTag independently, the proposal itself sounds reasonable to me. Do you think you can share some supporting performance numbers based on the existing benchmark or from your production environment?

Why are the changes needed?

Improve performance.

@peter-toth
Copy link
Contributor Author

peter-toth commented Jan 21, 2026

Although we need to handle copyFromTag independently, the proposal itself sounds reasonable to me. Do you think you can share some supporting performance numbers based on the existing benchmark or from your production environment?

Numbers depend heavily on the usecase. In our case a customer would like to use SPJ, between table A and B. Both tables are storage partitoned, but B is storage partitioned by some columns that don't match the join condition of the query. In this case "one side shuffle" can help if spark.sql.sources.v2.bucketing.shuffle.enabled is enabled and only B will be shuffled, but the unecessary grouping of partitions still happens in case of B. And while storage partitioning of B helps in other queries, in this particular one it significantly decreases partitioning and slows down the stage before the shuffle.

The optimization in this PR is similar to what spark.sql.sources.v2.bucketing.partiallyClusteredDistribution.enabled does to keep one side partially clustered (ungrouped) in SPJ. But the optimization kicks in in case of "one side shuffle" SPJ and when there is no SPJ.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @peter-toth .

BTW, this PR is not rebased to the master after merging #53884 . Did I understand correctly?

@peter-toth peter-toth force-pushed the SPARK-55092-kgp-do-not-group-partitions-when-not-needed branch from f63091b to 4fd8026 Compare January 22, 2026 09:02
@peter-toth
Copy link
Contributor Author

peter-toth commented Jan 22, 2026

BTW, this PR is not rebased to the master after merging #53884 . Did I understand correctly?

You are right. I've just rebased it. The diff should be ok now.

@dongjoon-hyun
Copy link
Member

Thank you!

@peter-toth
Copy link
Contributor Author

@szehon-ho , @sunchao , @viirya , do you have any concerns or comments?

@szehon-ho
Copy link
Member

hi , this seems useful, i will try to review this week, but if it looks ok to @sunchao @viirya , go ahead

sparkSession: SparkSession,
adaptiveExecutionRule: Option[InsertAdaptiveSparkPlan] = None,
subquery: Boolean): Seq[Rule[SparkPlan]] = {
val requiredDistribution = if (subquery) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure i get this, if its not a subquery we pass in any requiredDistribution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let me change this tomorrow and pass in subquery directly into EnsureRequirements, that way this will be much cleaner.

Copy link
Contributor Author

@peter-toth peter-toth Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b04bb61 and added comments in c28fc3f.

val newChild = disableKeyGroupingIfNotNeeded(c)
ShuffleExchangeExec(newPartitioning, newChild, so, ps)
case _ =>
val newChild = disableKeyGroupingIfNotNeeded(child)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we make a method createShuffleExchangeExec(..., disableGrouping: Boolean) to reduce duplication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b04bb61.

@szehon-ho
Copy link
Member

@chirag-s-db if you also want to take a look?

}
}

private def populateNoGroupingPartitionInfo(plan: SparkPlan): SparkPlan = plan match {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like can be done with transform api?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can change this to use transform() APIs.
Wanted to make it similar to the other 2 populate...() methods. Shall I change those as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified all 3 populate...()s in b04bb61.

child, values, joinKeyPositions, reducers, applyPartialClustering, replicatePartitions))
}

private def disableKeyGroupingIfNotNeeded(child: SparkPlan) = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More detailed comments on this method would be good, e.g., the conditions under which grouping can be safely disabled, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added comments in b04bb61.

sparkSession: SparkSession,
adaptiveExecutionRule: Option[InsertAdaptiveSparkPlan] = None,
subquery: Boolean): Seq[Rule[SparkPlan]] = {
val requiredDistribution = if (subquery) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add more detailed comments here? It looks confusing without any context when looking code here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this and now passing in subquery and added comments in c28fc3f.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable improvement.

@chirag-s-db
Copy link
Contributor

Could we also support the case where the KeyGroupedPartitioning is the output of the plan with the following approach?

  1. By default, disableGrouping is true in both the scan and in a field in the KeyGroupedPartitioning.
  2. With disableGrouping=true, a KeyGroupedPartitioning can only satisfy the requirements of an UnspecifiedDistribution (or any distribution if there is only a single partition). However, with disableGrouping=true, we don't allow a KeyGroupedPartitioning (w/ > 1 partition) to satisfy the requirements of a Clustered or Ordered distribution.
  3. In EnsureRequirements, we add a new case here that checks if the KeyGroupedPartitioning could satisfy the requirement if it were to be grouped (essentially, if the current implementation of satisfies for KeyGroupedPartitioning that take into consideration whether grouping is enabled or disabled). If so, then we push down disableGrouping=false to both the scan and the KeyGroupedPartitioning, after which point we know that the partitioning has been used to satisfy some requirements, so we must do grouping. If we can't push down to the scan (for example, if the plan reporting KeyGroupedPartitioning is checkpointed), then we just add a shuffle as normal.

One advantage of this approach is that it allows us to avoid grouping for the (presumably not uncommon) case of a simple scan from a partitioned table, and it should still be safe for checkpointed scans (as the checkpointed scans would have a KeyGroupedPartitioning w/ disableGrouping=true, which would not satisfy most required distributions). This approach should also decrease the complexity of the EnsureRequirements changes (since we wouldn't have to catch all the cases in which a KeyGroupedPartitioning scan doesn't contribute to the output partitioning of the plan).

FYI @szehon-ho

@peter-toth
Copy link
Contributor Author

One advantage of this approach is that it allows us to avoid grouping for the (presumably not uncommon) case of a simple scan from a partitioned table, and it should still be safe for checkpointed scans (as the checkpointed scans would have a KeyGroupedPartitioning w/ disableGrouping=true, which would not satisfy most required distributions). This approach should also decrease the complexity of the EnsureRequirements changes (since we wouldn't have to catch all the cases in which a KeyGroupedPartitioning scan doesn't contribute to the output partitioning of the plan).

My concern with this approach is that we can introduce an extra shuffle above the checkpointed (ungrouped) data.

@szehon-ho
Copy link
Member

I really like @chirag-s-db 's idea, it is quite clean. Else we have to guard everywhere we make a Shuffle to disable it. But i also see the point about missing the opportunity to avoid shuffle for a checkpointed KeyGrouped RDD. Although i guess its not a very common case. So in short , no strong opinion either way. @sunchao @viirya wondering any opinion?

@peter-toth
Copy link
Contributor Author

peter-toth commented Jan 30, 2026

Please note that not only checkpointed RDDs, but cached RDDs would also need an extra shuffle.

Actually, I wonder if partition grouping by key is at the right place in BatchScanExec or it could be a new operator that does the grouping. The operator should reside between a consumer that requires ClusteredDistribution and a producer that provides "partitions with keys" partitioning. We could move spjParams to the new operator and its output partitioning would be KeyGroupedPartitioning. EnsureRequirements could insert the operator if needed, similarly to how it inserts exchanges now. As it is would be a new operator it could be inserted on the top of BatchScanExec and LogicalRDD (checkpointed plan) and InMemoryTableScanExec (cached plan) as well.

I’m happy to put together a POC PR; just let me know.

@szehon-ho
Copy link
Member

yea that sounds like it would be cleaner, and cover the checkpoint/ cache case. im not sure the detail about the disableGrouping by default will work, but curious to see, thanks @peter-toth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants