You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am currently using Spark 3.2, and considering to upgrade spark version to 3.4
I tested current production queries on Spark 3.4, I encountered error when aggregate field with max() function.
After many tests, I concluded that the error occurs when the max() function is performed on a nested field that contains null values.
For represent the issue, I created a table for test.
CREATE TABLE iceberg.test_db.aggregation_test (
col0 int,
col1 struct<col2:int>
)
USING iceberg;
INSERT INTO iceberg.test_db.aggregation_test VALUES (1709168400, named_struct('col2', 1709168400));
INSERT INTO iceberg.test_db.aggregation_test VALUES (1709175600, named_struct('col2', 1709175600));
INSERT INTO iceberg.test_db.aggregation_test VALUES (1709172000, named_struct('col2', 1709172000));
INSERT INTO iceberg.test_db.aggregation_test VALUES (null, named_struct('col2', null));
Then run below sql, I got error.
SELECT max(col1.col2) FROM iceberg.test_db.aggregation_test;
java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.iceberg.StructLike
at org.apache.iceberg.Accessors$WrappedPositionAccessor.get(Accessors.java:168)
at org.apache.iceberg.Accessors$WrappedPositionAccessor.get(Accessors.java:157)
at org.apache.iceberg.expressions.BoundReference.eval(BoundReference.java:40)
at org.apache.iceberg.expressions.ValueAggregate.eval(ValueAggregate.java:39)
at org.apache.iceberg.expressions.MaxAggregate.eval(MaxAggregate.java:28)
at org.apache.iceberg.expressions.BoundAggregate$NullSafeAggregator.update(BoundAggregate.java:143)
at org.apache.iceberg.expressions.AggregateEvaluator.update(AggregateEvaluator.java:82)
at org.apache.iceberg.spark.source.SparkScanBuilder.pushAggregation(SparkScanBuilder.java:242)
at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$.org$apache$spark$sql$execution$datasources$v2$V2ScanRelationPushDown$$rewriteAggregate(V2ScanRelationPushDown.scala:176)
at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$pushDownAggregates$1.applyOrElse(V2ScanRelationPushDown.scala:96)
at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$pushDownAggregates$1.applyOrElse(V2ScanRelationPushDown.scala:94)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:456)
at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$.pushDownAggregates(V2ScanRelationPushDown.scala:94)
at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$.$anonfun$apply$4(V2ScanRelationPushDown.scala:45)
at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$.$anonfun$apply$8(V2ScanRelationPushDown.scala:51)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:91)
at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$.apply(V2ScanRelationPushDown.scala:50)
at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$.apply(V2ScanRelationPushDown.scala:37)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:222)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:91)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:219)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:211)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:211)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:182)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:182)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:143)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:202)
at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:526)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:202)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:201)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:139)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:135)
at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:153)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:171)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:168)
at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:221)
at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:266)
at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:235)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:112)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4204)
at org.apache.spark.sql.Dataset.head(Dataset.scala:3200)
at org.apache.spark.sql.Dataset.take(Dataset.scala:3421)
at org.apache.spark.sql.Dataset.getRows(Dataset.scala:283)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:322)
at org.apache.spark.sql.Dataset.show(Dataset.scala:809)
at org.apache.spark.sql.Dataset.show(Dataset.scala:786)
With 1-depth field, it works without any problem.
SELECT max(col0) FROM iceberg.test_db.aggregation_test;
Can you help me?
The text was updated successfully, but these errors were encountered:
Apache Iceberg version
1.3.1
Query engine
Spark
Please describe the bug 🐞
Hi, I am currently using Spark 3.2, and considering to upgrade spark version to 3.4
I tested current production queries on Spark 3.4, I encountered error when aggregate field with max() function.
After many tests, I concluded that the error occurs when the max() function is performed on a nested field that contains null values.
For represent the issue, I created a table for test.
Then run below sql, I got error.
With 1-depth field, it works without any problem.
Can you help me?
The text was updated successfully, but these errors were encountered: