[SPARK-49179][SQL][3.4] Fix v2 multi bucketed inner joins throw AssertionError #47736

ulysses-you · 2024-08-13T06:07:28Z

backport #47683 to branch-3.4

What changes were proposed in this pull request?

For SMJ with inner join, it just wraps left and right output partitioning to PartitioningCollection so it may not satisfy the target required clustering.

Why are the changes needed?

Fix exception if the query contains multi bucketed inner joins

SELECT * FROM testcat.ns.t1
JOIN testcat.ns.t2 ON t1.id = t2.id
JOIN testcat.ns.t3 ON t1.id = t3.id

Cause: java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:264)
at org.apache.spark.sql.execution.exchange.EnsureRequirements.createKeyGroupedShuffleSpec(EnsureRequirements.scala:642)
at org.apache.spark.sql.execution.exchange.EnsureRequirements.$anonfun$checkKeyGroupCompatible$1(EnsureRequirements.scala:385)
at scala.collection.immutable.List.map(List.scala:247)
at scala.collection.immutable.List.map(List.scala:79)
at org.apache.spark.sql.execution.exchange.EnsureRequirements.checkKeyGroupCompatible(EnsureRequirements.scala:382)
at org.apache.spark.sql.execution.exchange.EnsureRequirements.checkKeyGroupCompatible(EnsureRequirements.scala:364)
at org.apache.spark.sql.execution.exchange.EnsureRequirements.org$apache$spark$sql$execution$exchange$EnsureRequirements$$ensureDistributionAndOrdering(EnsureRequirements.scala:166)
at org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$1.applyOrElse(EnsureRequirements.scala:714)
at org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$1.applyOrElse(EnsureRequirements.scala:689)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$4(TreeNode.scala:528)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:84)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:528)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:497)
at org.apache.spark.sql.execution.exchange.EnsureRequirements.apply(EnsureRequirements.scala:689)
at org.apache.spark.sql.execution.exchange.EnsureRequirements.apply(EnsureRequirements.scala:51)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec$.$anonfun$applyPhysicalRules$2(AdaptiveSparkPlanExec.scala:882)

Does this PR introduce any user-facing change?

yes, it's a bug fix

How was this patch tested?

add test

Was this patch authored or co-authored using generative AI tooling?

no

…rror ### What changes were proposed in this pull request? For SMJ with inner join, it just wraps left and right output partitioning to `PartitioningCollection` so it may not satisfy the target required clustering. ### Why are the changes needed? Fix exception if the query contains multi bucketed inner joins ```sql SELECT * FROM testcat.ns.t1 JOIN testcat.ns.t2 ON t1.id = t2.id JOIN testcat.ns.t3 ON t1.id = t3.id ``` ``` Cause: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:264) at org.apache.spark.sql.execution.exchange.EnsureRequirements.createKeyGroupedShuffleSpec(EnsureRequirements.scala:642) at org.apache.spark.sql.execution.exchange.EnsureRequirements.$anonfun$checkKeyGroupCompatible$1(EnsureRequirements.scala:385) at scala.collection.immutable.List.map(List.scala:247) at scala.collection.immutable.List.map(List.scala:79) at org.apache.spark.sql.execution.exchange.EnsureRequirements.checkKeyGroupCompatible(EnsureRequirements.scala:382) at org.apache.spark.sql.execution.exchange.EnsureRequirements.checkKeyGroupCompatible(EnsureRequirements.scala:364) at org.apache.spark.sql.execution.exchange.EnsureRequirements.org$apache$spark$sql$execution$exchange$EnsureRequirements$$ensureDistributionAndOrdering(EnsureRequirements.scala:166) at org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$1.applyOrElse(EnsureRequirements.scala:714) at org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$1.applyOrElse(EnsureRequirements.scala:689) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$4(TreeNode.scala:528) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:84) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:528) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:497) at org.apache.spark.sql.execution.exchange.EnsureRequirements.apply(EnsureRequirements.scala:689) at org.apache.spark.sql.execution.exchange.EnsureRequirements.apply(EnsureRequirements.scala:51) at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec$.$anonfun$applyPhysicalRules$2(AdaptiveSparkPlanExec.scala:882) ``` ### Does this PR introduce _any_ user-facing change? yes, it's a bug fix ### How was this patch tested? add test ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#47683 from ulysses-you/SPARK-49179. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: youxiduo <youxiduo@corp.netease.com> (cherry picked from commit 8133294) Signed-off-by: youxiduo <youxiduo@corp.netease.com>

ulysses-you · 2024-08-13T08:58:05Z

The failed test is irrelevant, merging to branch-3.4

…tionError backport #47683 to branch-3.4 ### What changes were proposed in this pull request? For SMJ with inner join, it just wraps left and right output partitioning to `PartitioningCollection` so it may not satisfy the target required clustering. ### Why are the changes needed? Fix exception if the query contains multi bucketed inner joins ```sql SELECT * FROM testcat.ns.t1 JOIN testcat.ns.t2 ON t1.id = t2.id JOIN testcat.ns.t3 ON t1.id = t3.id ``` ``` Cause: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:264) at org.apache.spark.sql.execution.exchange.EnsureRequirements.createKeyGroupedShuffleSpec(EnsureRequirements.scala:642) at org.apache.spark.sql.execution.exchange.EnsureRequirements.$anonfun$checkKeyGroupCompatible$1(EnsureRequirements.scala:385) at scala.collection.immutable.List.map(List.scala:247) at scala.collection.immutable.List.map(List.scala:79) at org.apache.spark.sql.execution.exchange.EnsureRequirements.checkKeyGroupCompatible(EnsureRequirements.scala:382) at org.apache.spark.sql.execution.exchange.EnsureRequirements.checkKeyGroupCompatible(EnsureRequirements.scala:364) at org.apache.spark.sql.execution.exchange.EnsureRequirements.org$apache$spark$sql$execution$exchange$EnsureRequirements$$ensureDistributionAndOrdering(EnsureRequirements.scala:166) at org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$1.applyOrElse(EnsureRequirements.scala:714) at org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$1.applyOrElse(EnsureRequirements.scala:689) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$4(TreeNode.scala:528) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:84) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:528) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:497) at org.apache.spark.sql.execution.exchange.EnsureRequirements.apply(EnsureRequirements.scala:689) at org.apache.spark.sql.execution.exchange.EnsureRequirements.apply(EnsureRequirements.scala:51) at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec$.$anonfun$applyPhysicalRules$2(AdaptiveSparkPlanExec.scala:882) ``` ### Does this PR introduce _any_ user-facing change? yes, it's a bug fix ### How was this patch tested? add test ### Was this patch authored or co-authored using generative AI tooling? no Closes #47736 from ulysses-you/SPARK-49179-3.4. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: youxiduo <youxiduo@corp.netease.com>

dongjoon-hyun

+1, late LGTM.
Thank you, @ulysses-you and @yaooqinn .

ulysses-you added 2 commits August 13, 2024 14:00

fix

594b840

github-actions bot added the SQL label Aug 13, 2024

yaooqinn approved these changes Aug 13, 2024

View reviewed changes

ulysses-you closed this Aug 13, 2024

ulysses-you deleted the SPARK-49179-3.4 branch August 13, 2024 08:58

dongjoon-hyun reviewed Aug 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-49179][SQL][3.4] Fix v2 multi bucketed inner joins throw AssertionError #47736

[SPARK-49179][SQL][3.4] Fix v2 multi bucketed inner joins throw AssertionError #47736

Uh oh!

ulysses-you commented Aug 13, 2024

Uh oh!

ulysses-you commented Aug 13, 2024

Uh oh!

dongjoon-hyun left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-49179][SQL][3.4] Fix v2 multi bucketed inner joins throw AssertionError #47736

[SPARK-49179][SQL][3.4] Fix v2 multi bucketed inner joins throw AssertionError #47736

Uh oh!

Conversation

ulysses-you commented Aug 13, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

ulysses-you commented Aug 13, 2024

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants