Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark 3.4] java.lang.Integer cannot be cast to org.apache.iceberg.StructLike #9831

Closed
bluzy opened this issue Feb 29, 2024 · 2 comments
Closed
Labels
bug Something isn't working

Comments

@bluzy
Copy link

bluzy commented Feb 29, 2024

Apache Iceberg version

1.3.1

Query engine

Spark

Please describe the bug 🐞

Hi, I am currently using Spark 3.2, and considering to upgrade spark version to 3.4

I tested current production queries on Spark 3.4, I encountered error when aggregate field with max() function.

After many tests, I concluded that the error occurs when the max() function is performed on a nested field that contains null values.

For represent the issue, I created a table for test.

CREATE TABLE iceberg.test_db.aggregation_test (
    col0 int,
    col1 struct<col2:int>
    )
USING iceberg;

INSERT INTO iceberg.test_db.aggregation_test VALUES (1709168400, named_struct('col2', 1709168400));
INSERT INTO iceberg.test_db.aggregation_test VALUES (1709175600, named_struct('col2', 1709175600));
INSERT INTO iceberg.test_db.aggregation_test VALUES (1709172000, named_struct('col2', 1709172000));
INSERT INTO iceberg.test_db.aggregation_test VALUES (null, named_struct('col2', null));

Then run below sql, I got error.

SELECT max(col1.col2) FROM iceberg.test_db.aggregation_test;

java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.iceberg.StructLike
	at org.apache.iceberg.Accessors$WrappedPositionAccessor.get(Accessors.java:168)
	at org.apache.iceberg.Accessors$WrappedPositionAccessor.get(Accessors.java:157)
	at org.apache.iceberg.expressions.BoundReference.eval(BoundReference.java:40)
	at org.apache.iceberg.expressions.ValueAggregate.eval(ValueAggregate.java:39)
	at org.apache.iceberg.expressions.MaxAggregate.eval(MaxAggregate.java:28)
	at org.apache.iceberg.expressions.BoundAggregate$NullSafeAggregator.update(BoundAggregate.java:143)
	at org.apache.iceberg.expressions.AggregateEvaluator.update(AggregateEvaluator.java:82)
	at org.apache.iceberg.spark.source.SparkScanBuilder.pushAggregation(SparkScanBuilder.java:242)
	at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$.org$apache$spark$sql$execution$datasources$v2$V2ScanRelationPushDown$$rewriteAggregate(V2ScanRelationPushDown.scala:176)
	at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$pushDownAggregates$1.applyOrElse(V2ScanRelationPushDown.scala:96)
	at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$pushDownAggregates$1.applyOrElse(V2ScanRelationPushDown.scala:94)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:456)
	at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$.pushDownAggregates(V2ScanRelationPushDown.scala:94)
	at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$.$anonfun$apply$4(V2ScanRelationPushDown.scala:45)
	at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$.$anonfun$apply$8(V2ScanRelationPushDown.scala:51)
	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
	at scala.collection.immutable.List.foldLeft(List.scala:91)
	at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$.apply(V2ScanRelationPushDown.scala:50)
	at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$.apply(V2ScanRelationPushDown.scala:37)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:222)
	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
	at scala.collection.immutable.List.foldLeft(List.scala:91)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:219)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:211)
	at scala.collection.immutable.List.foreach(List.scala:431)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:211)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:182)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:182)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:143)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:202)
	at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:526)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:202)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:201)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:139)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:135)
	at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:153)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:171)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:168)
	at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:221)
	at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:266)
	at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:235)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:112)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4204)
	at org.apache.spark.sql.Dataset.head(Dataset.scala:3200)
	at org.apache.spark.sql.Dataset.take(Dataset.scala:3421)
	at org.apache.spark.sql.Dataset.getRows(Dataset.scala:283)
	at org.apache.spark.sql.Dataset.showString(Dataset.scala:322)
	at org.apache.spark.sql.Dataset.show(Dataset.scala:809)
	at org.apache.spark.sql.Dataset.show(Dataset.scala:786)

With 1-depth field, it works without any problem.

SELECT max(col0) FROM iceberg.test_db.aggregation_test;

Can you help me?

@bluzy bluzy added the bug Something isn't working label Feb 29, 2024
@amogh-jahagirdar
Copy link
Contributor

@bluzy this should be fixed in Iceberg 1.5, here's the PR that should address it: #9176

@amogh-jahagirdar
Copy link
Contributor

I'm going to close this for now, but if you still see this problem after upgrading to 1.5 (still not released, but should be very soon) please reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants