[Spark 3.4] java.lang.Integer cannot be cast to org.apache.iceberg.StructLike #9831

bluzy · 2024-02-29T03:51:23Z

Apache Iceberg version

1.3.1

Query engine

Spark

Please describe the bug 🐞

Hi, I am currently using Spark 3.2, and considering to upgrade spark version to 3.4

I tested current production queries on Spark 3.4, I encountered error when aggregate field with max() function.

After many tests, I concluded that the error occurs when the max() function is performed on a nested field that contains null values.

For represent the issue, I created a table for test.

CREATE TABLE iceberg.test_db.aggregation_test (
    col0 int,
    col1 struct<col2:int>
    )
USING iceberg;

INSERT INTO iceberg.test_db.aggregation_test VALUES (1709168400, named_struct('col2', 1709168400));
INSERT INTO iceberg.test_db.aggregation_test VALUES (1709175600, named_struct('col2', 1709175600));
INSERT INTO iceberg.test_db.aggregation_test VALUES (1709172000, named_struct('col2', 1709172000));
INSERT INTO iceberg.test_db.aggregation_test VALUES (null, named_struct('col2', null));

Then run below sql, I got error.

SELECT max(col1.col2) FROM iceberg.test_db.aggregation_test;

java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.iceberg.StructLike
	at org.apache.iceberg.Accessors$WrappedPositionAccessor.get(Accessors.java:168)
	at org.apache.iceberg.Accessors$WrappedPositionAccessor.get(Accessors.java:157)
	at org.apache.iceberg.expressions.BoundReference.eval(BoundReference.java:40)
	at org.apache.iceberg.expressions.ValueAggregate.eval(ValueAggregate.java:39)
	at org.apache.iceberg.expressions.MaxAggregate.eval(MaxAggregate.java:28)
	at org.apache.iceberg.expressions.BoundAggregate$NullSafeAggregator.update(BoundAggregate.java:143)
	at org.apache.iceberg.expressions.AggregateEvaluator.update(AggregateEvaluator.java:82)
	at org.apache.iceberg.spark.source.SparkScanBuilder.pushAggregation(SparkScanBuilder.java:242)
	at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$.org$apache$spark$sql$execution$datasources$v2$V2ScanRelationPushDown$$rewriteAggregate(V2ScanRelationPushDown.scala:176)
	at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$pushDownAggregates$1.applyOrElse(V2ScanRelationPushDown.scala:96)
	at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$pushDownAggregates$1.applyOrElse(V2ScanRelationPushDown.scala:94)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:456)
	at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$.pushDownAggregates(V2ScanRelationPushDown.scala:94)
	at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$.$anonfun$apply$4(V2ScanRelationPushDown.scala:45)
	at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$.$anonfun$apply$8(V2ScanRelationPushDown.scala:51)
	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
	at scala.collection.immutable.List.foldLeft(List.scala:91)
	at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$.apply(V2ScanRelationPushDown.scala:50)
	at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$.apply(V2ScanRelationPushDown.scala:37)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:222)
	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
	at scala.collection.immutable.List.foldLeft(List.scala:91)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:219)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:211)
	at scala.collection.immutable.List.foreach(List.scala:431)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:211)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:182)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:182)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:143)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:202)
	at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:526)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:202)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:201)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:139)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:135)
	at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:153)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:171)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:168)
	at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:221)
	at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:266)
	at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:235)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:112)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4204)
	at org.apache.spark.sql.Dataset.head(Dataset.scala:3200)
	at org.apache.spark.sql.Dataset.take(Dataset.scala:3421)
	at org.apache.spark.sql.Dataset.getRows(Dataset.scala:283)
	at org.apache.spark.sql.Dataset.showString(Dataset.scala:322)
	at org.apache.spark.sql.Dataset.show(Dataset.scala:809)
	at org.apache.spark.sql.Dataset.show(Dataset.scala:786)

With 1-depth field, it works without any problem.

SELECT max(col0) FROM iceberg.test_db.aggregation_test;

Can you help me?

The text was updated successfully, but these errors were encountered:

amogh-jahagirdar · 2024-02-29T05:01:29Z

@bluzy this should be fixed in Iceberg 1.5, here's the PR that should address it: #9176

amogh-jahagirdar · 2024-02-29T05:02:34Z

I'm going to close this for now, but if you still see this problem after upgrading to 1.5 (still not released, but should be very soon) please reopen.

bluzy added the bug Something isn't working label Feb 29, 2024

amogh-jahagirdar closed this as completed Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spark 3.4] java.lang.Integer cannot be cast to org.apache.iceberg.StructLike #9831

[Spark 3.4] java.lang.Integer cannot be cast to org.apache.iceberg.StructLike #9831

bluzy commented Feb 29, 2024 •

edited

Loading

amogh-jahagirdar commented Feb 29, 2024

amogh-jahagirdar commented Feb 29, 2024

[Spark 3.4] java.lang.Integer cannot be cast to org.apache.iceberg.StructLike #9831

[Spark 3.4] java.lang.Integer cannot be cast to org.apache.iceberg.StructLike #9831

Comments

bluzy commented Feb 29, 2024 • edited Loading

Apache Iceberg version

Query engine

Please describe the bug 🐞

amogh-jahagirdar commented Feb 29, 2024

amogh-jahagirdar commented Feb 29, 2024

bluzy commented Feb 29, 2024 •

edited

Loading