[HUDI-7949] insert into hudi table with columns specified #11568

KnightChess · 2024-07-04T11:58:13Z

as #11552 describe, hoodie insert not support column specified, support this case:

user can specified column order when insert into/overwrite

Change Logs

only support spark version >= 3.2

if user specified column list, reorder plan with Project

spark34 not support when default value enable
apache/spark#36077

first: in this pr, spark34 support default value for insert into, it will regenerate the user specified cols
so, no need deal with it in hudi side
second: in this pr, it will append hoodie meta field with default value, has some bug, it look like be fixed
in spark35([SPARK-43742][SQL] Refactor default column value resolution spark#41262), so if user want specified cols in spark34, need disable default feature.

Impact

none

Risk level (write none, low medium or high below)

none

Documentation Update

none

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

danny0405 · 2024-07-06T00:03:14Z

...-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/HoodieCatalystPlansUtils.scala

   * changes in Spark 3.3
   */
-  def unapplyInsertIntoStatement(plan: LogicalPlan): Option[(LogicalPlan, Map[String, Option[String]], LogicalPlan, Boolean, Boolean)]
+  def unapplyInsertIntoStatement(plan: LogicalPlan): Option[(LogicalPlan, Seq[String], Map[String, Option[String]], LogicalPlan, Boolean, Boolean)]


Can we add a doc for the new param?

danny0405 · 2024-07-06T00:03:48Z

...datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala

+          case lr: LogicalRelation =>
+            // Create a project if this is an INSERT INTO query with specified cols.
+            val projectByUserSpecified = if (userSpecifiedCols.nonEmpty) {
+              assert(lr.catalogTable.isDefined, "Missing catalog table")


Use ValidationUtils.checkState instead.

danny0405 · 2024-07-06T00:04:34Z

...tasource/hudi-spark2/src/main/scala/org/apache/spark/sql/HoodieSpark2CatalystPlanUtils.scala

    plan match {
      case InsertIntoTable(table, partition, query, overwrite, ifPartitionNotExists) =>
-        Some((table, partition, query, overwrite, ifPartitionNotExists))
+        Some((table, Seq.empty, partition, query, overwrite, ifPartitionNotExists))


Do you think we should log some msg here?

I think it unnecessary, because it is not supported at the sql grammar level

danny0405 · 2024-07-06T00:05:34Z

...e/hudi-spark3-common/src/main/scala/org/apache/spark/sql/HoodieSpark3CatalystPlanUtils.scala


-  override def unapplyInsertIntoStatement(plan: LogicalPlan): Option[(LogicalPlan, Map[String, Option[String]], LogicalPlan, Boolean, Boolean)] = {
-    plan match {
-      case insert: InsertIntoStatement =>


Why remove this impl?

every subclass has it own impl, so I remove it

danny0405 · 2024-07-06T00:05:57Z

...rce/hudi-spark3.1.x/src/main/scala/org/apache/spark/sql/HoodieSpark31CatalystPlanUtils.scala

+  override def unapplyInsertIntoStatement(plan: LogicalPlan): Option[(LogicalPlan, Seq[String], Map[String, Option[String]], LogicalPlan, Boolean, Boolean)] = {
+    plan match {
+      case insert: InsertIntoStatement =>
+        Some((insert.table, Seq.empty, insert.partitionSpec, insert.query, insert.overwrite, insert.ifPartitionNotExists))


Should we log some msg here?

hudi-bot · 2024-07-06T14:17:16Z

CI report:

a0ee487 Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

danny0405 · 2024-07-07T02:12:12Z

@leesf Do you have intreast to reivew this PR?

leesf · 2024-07-07T23:48:02Z

...ark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/TestInsertTable.scala

    spark.sessionState.conf.unsetConf("hoodie.datasource.write.operation")
  }
+
+  test("Test insert into with special cols") {


special -> specified

leesf · 2024-07-07T23:48:15Z

...ark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/TestInsertTable.scala

+    }
+  }
+
+  test("Test insert overwrite with special cols") {


leesf · 2024-07-07T23:49:33Z

LGTM, with minor comments

codope · 2024-07-11T12:15:40Z

@KnightChess Looks like this PR does not handle partition spec? Can you please check HUDI-7964?

KnightChess · 2024-07-11T12:50:34Z

@codope hi, need specified like insert into aaa (id, day, price, name, hour) values (2, '01', 12.2, 'bbb', '02') will trigger this logical, but you case not specified columns, and I test rollback this pr also has problem you point, so it's not caused by this submission

codope · 2024-07-11T13:51:45Z

@KnightChess Yes it is not due to this PR. I just tested by creating a parquet table and it's still the same behavior. So, issue is something unrelated to Hudi. You could try as well:

spark-sql> drop table if exists test_table;
Time taken: 0.459 seconds
spark-sql>
         > create table test_table (
         >     ts BIGINT,
         >     id STRING,
         >     rider STRING,
         >     driver STRING,
         >     fare DOUBLE,
         >     city STRING,
         >     state STRING
         > )
         > USING parquet
         > PARTITIONED BY (state, city)
         > location 'file:///tmp/hudi_test_table';

spark-sql> INSERT INTO test_table VALUES (1695159649,'trip1','rider-A','driver-K',19.10,'san_francisco','california');

spark-sql> INSERT INTO test_table VALUES (1695091554,'trip2','rider-C','driver-M',27.70,'austin','texas');

This is what directory structure looks like under the base path:

KnightChess · 2024-07-11T14:02:22Z

@codope got it o(￣▽￣)o

KnightChess · 2024-07-11T14:17:37Z

@codope if specified, look like work well.

[HUDI-7949] insert into hudi table with columns specified

19a2902

github-actions bot added the size:L PR with lines of changes in (300, 1000] label Jul 4, 2024

danny0405 reviewed Jul 6, 2024

View reviewed changes

KnightChess added 2 commits July 6, 2024 18:58

fix spark34 specified cols can not work

82b2ac5

update

a0ee487

leesf reviewed Jul 7, 2024

View reviewed changes

danny0405 approved these changes Jul 8, 2024

View reviewed changes

danny0405 merged commit ef83379 into apache:master Jul 8, 2024

danny0405 mentioned this pull request Jul 8, 2024

[SUPPORT] insert into hudi table with columns specified(reordered and not in table schema order) throws exception #11552

Closed

codope self-assigned this Jul 11, 2024

KnightChess deleted the insert-into-cols-specified branch July 11, 2024 13:56

hudi-bot mentioned this pull request Nov 30, 2025

Support INSERT SQL statement with a subset of columns in Spark 3.4 #16805

Open

[HUDI-7949] insert into hudi table with columns specified #11568

[HUDI-7949] insert into hudi table with columns specified #11568

Uh oh!

Conversation

KnightChess commented Jul 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hudi-bot commented Jul 6, 2024

CI report:

Uh oh!

danny0405 commented Jul 7, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leesf commented Jul 7, 2024

Uh oh!

codope commented Jul 11, 2024

Uh oh!

KnightChess commented Jul 11, 2024

Uh oh!

codope commented Jul 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KnightChess commented Jul 11, 2024

Uh oh!

KnightChess commented Jul 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

KnightChess commented Jul 4, 2024 •

edited

Loading

codope commented Jul 11, 2024 •

edited

Loading