Skip to content

[SUPPORT] HoodieSparkSessionExtension and IcebergSparkSessionExtensions conflict #12808

@pravin1406

Description

@pravin1406

I have a use case where i may have to call spark procedures or metadata queries on hudi tables as well as iceberg tables within same spark job.
I have to run all my uses cases from spark sql...so can't call java api's directly in this case, hence using sql procedures.
But when i set
spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension,org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

It does not matter what value i set to spark_catalog. None of the combination work.
i have tried keeping hoodie extension last and then set spark_catalog to below..but still no help.

spark.sql.catalog.spark_catalog = org.apache.iceberg.spark.SparkSessionCatalog
spark.sql.catalog.spark_catalog.type = hive
spark.sql.catalog.spark_catalog.uri = thrift://localhost:9083

Only iceberg procedure works...if i reverse the order of extension only the latter extension procedures works.
I get following error

org.apache.spark.sql.catalyst.analysis.NoSuchProcedureException: Procedure default.show_commits not found
and
org.apache.spark.sql.AnalysisException: procedure: rewrite_data_files is not exists

Is there a way to make both extension work? or even when i add more extensions which i do intend to.

Steps to reproduce

Run spark-shell with
following 2 properties set
spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension,org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.jars=iceberg-spark-runtime-3.2_2.12-1.4.3.2.jar,hudi-spark3.2-bundle_2.12-0.14.1.4.jar

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

  • Hudi version : 0.14.1

  • Spark version : 3.3.2

  • Hadoop version : 3.1.3

  • Storage (HDFS/S3/GCS..) : HDFS

  • Running on Docker? (yes/no) : no

Sample stacktrace in both cases
scala> spark.sql("call show_commits(table => 'default.18nove_try1_rt', limit => 100)").show(false) ANTLR Tool version 4.9.3 used for code generation does not match the current runtime version 4.8ANTLR Runtime version 4.9.3 used for parser compilation does not match the current runtime version 4.8ANTLR Tool version 4.9.3 used for code generation does not match the current runtime version 4.8ANTLR Runtime version 4.9.3 used for parser compilation does not match the current runtime version 4.8 org.apache.spark.sql.catalyst.analysis.NoSuchProcedureException: Procedure default.show_commits not found spark.sql.catalyst.analysis.NoSuchProcedureException: Procedure default.show_commits not found at org.apache.iceberg.spark.BaseCatalog.loadProcedure(BaseCatalog.java:57) at org.apache.iceberg.spark.SparkSessionCatalog.loadProcedure(SparkSessionCatalog.java:56) at org.apache.spark.sql.catalyst.analysis.ResolveProcedures$$anonfun$apply$1.applyOrElse(ResolveProcedures.scala:47) at org.apache.spark.sql.catalyst.analysis.ResolveProcedures$$anonfun$apply$1.applyOrElse(ResolveProcedures.scala:45) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$2(AnalysisHelper.scala:170) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:170) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:168) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:164) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning(AnalysisHelper.scala:99) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning$(AnalysisHelper.scala:96) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:76) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:75) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.analysis.ResolveProcedures.apply(ResolveProcedures.scala:45) at org.apache.spark.sql.catalyst.analysis.ResolveProcedures.apply(ResolveProcedures.scala:41) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211) at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) at scala.collection.immutable.List.foldLeft(List.scala:91) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200) at scala.collection.immutable.List.foreach(List.scala:431) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200) at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:215) at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:209) at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:172) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)

scala> spark.sql("CALL spark_catalog.system.rewrite_data_files(table => 'iceberg_staging.iceberg_6nov_try1', options => map('max-file-size-bytes','1073741824'))").show(false) org.apache.spark.sql.AnalysisException: procedure: rewrite_data_files is not exists at org.apache.spark.sql.hudi.analysis.ResolveImplementations.loadProcedure(HoodieAnalysis.scala:472) at org.apache.spark.sql.hudi.analysis.ResolveImplementations.$anonfun$apply$3(HoodieAnalysis.scala:436) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323) at org.apache.spark.sql.hudi.analysis.ResolveImplementations.apply(HoodieAnalysis.scala:405) at org.apache.spark.sql.hudi.analysis.ResolveImplementations.apply(HoodieAnalysis.scala:401) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211) at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) at scala.collection.immutable.List.foldLeft(List.scala:91) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200) at scala.collection.immutable.List.foreach(List.scala:431) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200) at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:215) at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:209) at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:172) at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88) at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179) at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:193) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330) at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:192) at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:88) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:196) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:196) at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:88) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:86) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:78)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    Status

    👤 User Action

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions