You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Join with an existing delta table and a forEachBatch dataframe Resulting in a Cross Join when Inner join is needed?
I am using Apache Spark 2.4.5
Delta: 0.5.0
The error org.apache.spark.sql.AnalysisException: Detected implicit cartesian product for INNER join between logical plans happens in following line of code
Dataset<Row> aggregatedDf = batchDF
.selectExpr("CAST(value AS STRING) AS v1").as("srcDf")
.join(resultTable.toDF().as("new_result"), col("new_result.v2").equalTo("srcDf.v1"));
Look at the stage4 I am not sure why the union is happening
Here is the full execution plan from the UI
The error.
org.apache.spark.sql.AnalysisException: Detected implicit cartesian product for INNER join between logical plans
Project
+- Filter (isnotnull(value#725) && (value#725 = result.v2))
+- SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, java.lang.String, true], true, false) AS value#725]
+- ExternalRDD [obj#724]
and
Project
+- Relation[v2#727,v1#728] parquet
Join condition is missing or trivial.
Either: use the CROSS JOIN syntax to allow cartesian products between these
relations, or: enable implicit cartesian products by setting the configuration
variable spark.sql.crossJoin.enabled=true;
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$22.applyOrElse(Optimizer.scala:1295)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$22.applyOrElse(Optimizer.scala:1292)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:258)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:258)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:257)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformDown(AnalysisHelper.scala:149)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:263)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:263)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:328)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:186)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:326)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:263)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformDown(AnalysisHelper.scala:149)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:247)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$.apply(Optimizer.scala:1292)
at org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$.apply(Optimizer.scala:1274)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:67)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:67)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:73)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:69)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:78)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:78)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3365)
at org.apache.spark.sql.Dataset.count(Dataset.scala:2835)
at Test.lambda$main$3dd8454f$1(Test.java:37)
at org.apache.spark.sql.streaming.DataStreamWriter$$anonfun$foreachBatch$1.apply(DataStreamWriter.scala:390)
at org.apache.spark.sql.streaming.DataStreamWriter$$anonfun$foreachBatch$1.apply(DataStreamWriter.scala:390)
at org.apache.spark.sql.execution.streaming.sources.ForeachBatchSink.addBatch(ForeachBatchSink.scala:35)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5$$anonfun$apply$17.apply(MicroBatchExecution.scala:537)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5.apply(MicroBatchExecution.scala:535)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:534)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:281)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
The text was updated successfully, but these errors were encountered:
kant777
changed the title
Join with an existing delta table with a forEachBatch dataframe Resulting in a Cross Join when Inner join is needed?
Join with an existing delta table and a forEachBatch dataframe Resulting in a Cross Join when Inner join is needed?
Apr 1, 2020
…test utils and ExecutionITCaseBase class to common util class (delta-io#372)
* TestUtilCleanup - Move few methods from Flink sink test utils to common Flink connector tests utils.
Signed-off-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>
* TestUtilCleanup - delta-io#2 Move few methods from Flink sink test utils to common Flink connector tests utils.
Signed-off-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>
* TestUtilCleanup - delta-io#3 Move few methods from Flink sink test utils to common Flink connector tests utils.
Signed-off-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>
* TestUtilCleanUp - Changes after Review
Signed-off-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>
* TestUtilCleanUp - new changes
Signed-off-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>
* TestUtilCleanUp - more refactoring
Signed-off-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>
* TestUtilCleanUp - additional refactoring
Signed-off-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>
* TestUtilCleanUp - Merge From master + remove source partition table. Add log4j property file for tests.
Signed-off-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>
* TestUtilCleanUp - Merge From master + remove source partition table. Add log4j property file for tests.
Signed-off-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>
Co-authored-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>
Join with an existing delta table and a forEachBatch dataframe Resulting in a Cross Join when Inner join is needed?
I am using Apache Spark 2.4.5
Delta: 0.5.0
The error
org.apache.spark.sql.AnalysisException: Detected implicit cartesian product for INNER join between logical plans
happens in following line of codeSample code to reproduce
The result table has the following schema.
The same data of the result table (FYI, the numbers below are encoded as strings)
Look at the stage4 I am not sure why the union is happening
Here is the full execution plan from the UI
The error.
The text was updated successfully, but these errors were encountered: