Skip to content

Support INSERT SQL statement with a subset of columns in Spark 3.4 #16805

@hudi-bot

Description

@hudi-bot

The new tests TestInsertTable.Test Insert Into with subset of columns, Test Insert Into with subset of columns on Parquet table fail on Spark 3.4 due to our validation introduced in HoodieSpark34CatalystPlanUtils in [https://github.com//pull/11568].  Without this change, INSERT INTO with subset of columns used to work.

 
{code:java}
override def unapplyInsertIntoStatement(plan: LogicalPlan): Option[(LogicalPlan, Seq[String], Map[String, Option[String]], LogicalPlan, Boolean, Boolean)] = {
plan match {
case insert: InsertIntoStatement =>
// apache/spark#36077
// first: in this pr, spark34 support default value for insert into, it will regenerate the user specified cols
// so, no need deal with it in hudi side
// second: in this pr, it will append hoodie meta field with default value, has some bug, it look like be fixed
// in spark35(apache/spark#41262), so if user want specified cols, need disable default feature.
if (SQLConf.get.enableDefaultColumns) {
if (insert.userSpecifiedCols.nonEmpty) {
throw new AnalysisException("hudi not support specified cols when enable default columns, " +
"please disable 'spark.sql.defaultColumn.enabled'")
}
Some((insert.table, Seq.empty, insert.partitionSpec, insert.query, insert.overwrite, insert.ifPartitionNotExists))
} else {
Some((insert.table, insert.userSpecifiedCols, insert.partitionSpec, insert.query, insert.overwrite, insert.ifPartitionNotExists))
}
case _ =>
None
}
} {code}
 

 

JIRA info

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions