[SPARK-38707][SQL] Allow user to insert into only certain columns of a table #36020

morvenhuang · 2022-03-31T03:32:26Z

What changes were proposed in this pull request?

Update the column number check logic for INSERT INTO statement, avoid checking all columns of the table when user has specified a column list in the statement.

Why are the changes needed?

When running INSERT INTO statement, it's quite common that user wants to insert only certain columns, especially when the rest columns have default value.

Currently, spark allows user to specify column list for INSERT INTO statement only when the column list contains all columns of the table. If user does not provided a completed list of column, it will result in an AnalysisException.

This patch allows user to insert into only certain columns of the table, which will help when excuting INSERT INTO, especially when excuting them on RDBMS.

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

New test case added.

AmplabJenkins · 2022-03-31T16:00:04Z

Can one of the admins verify this patch?

morvenhuang · 2022-04-01T01:44:59Z

Can one of the admins verify this patch?

There're some failed test, I'm looking into it.

HyukjinKwon · 2022-04-01T04:48:08Z

@morvenhuang it would be great to assess this further if you add some references of other DBMSes that support this syntax. Does Hive support this?

morvenhuang · 2022-04-01T06:59:24Z

@HyukjinKwon Hi Hyukjin, thank you for your time. I believe this is a SQL-92 standard, most RDBMSes I've known support inserting only certain columns of a table, like MySQL/Oracle/MS SQL SERVER/Teradata. And yes, Hive also supports this.

HyukjinKwon · 2022-04-03T00:25:07Z

cc @dtenedor FYI. Seems like actually it's already implemented with spark.sql.defaultColumn.useNullsForMissingDefautValues configuration?

…r change to avoid unnecessary column size check when USE_NULLS_FOR_MISSING_DEFAULT_COLUMN_VALUES is enabled, since all missing default value(s) will be added to query automatically.

morvenhuang · 2022-04-06T10:09:06Z

@HyukjinKwon Hi Hyukjin, thanks for the comment, that implementation is a great job, although there's a tiny issue due to unnecessary column number check, say we have a table t1 with 2 columns c1 int, c2 int, insert into t1(c1) values(100) is still gonna fail even when the useNullsForMissingDefautValues is enabled.

I believe that column number check is not necessary anymore when this option is enabled, since all omitted column(s) will be added back to query automatically during parse, which means insert into t1(c1) values(100) is actually insert into t1(c1, c2) values(100, null) after parse.

I've made a commit to get rid of the check, so that the insert into t1(c1) values(100) statement can work.

morvenhuang · 2022-04-07T09:24:12Z

@HyukjinKwon Hi Hyukjin, could you please help to verify this? Many thanks.

dtenedor · 2022-04-07T16:17:18Z

FYI I also have this relevant PR for developing INSERT INTO support out for review as well :)

#36077

morvenhuang · 2022-04-08T01:31:49Z

@dtenedor that's great, thank you for pointing out, I'm gonna close this one.

dtenedor · 2022-04-08T01:47:12Z

@morvenhuang looking forward to working more on Apache Spark with you and the community!

SPARK-38707 Allow user to insert into only certain columns of a table

42fd529

github-actions bot added the SQL label Mar 31, 2022

morvenhuang changed the title ~~SPARK-38707 Allow user to insert into only certain columns of a table~~ [SPARK-38707][SQL] Allow user to insert into only certain columns of a table Mar 31, 2022

Merge branch 'master' into SPARK-38707

6a6cd96

morvenhuang added 2 commits April 6, 2022 15:30

Merge branch 'master' into SPARK-38707

300c197

SPARK-38707 revert previous change of this jira, instead, make a mino…

e8e0917

…r change to avoid unnecessary column size check when USE_NULLS_FOR_MISSING_DEFAULT_COLUMN_VALUES is enabled, since all missing default value(s) will be added to query automatically.

SPARK-38707 comments

85eb874

morvenhuang closed this Apr 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-38707][SQL] Allow user to insert into only certain columns of a table #36020

[SPARK-38707][SQL] Allow user to insert into only certain columns of a table #36020

Uh oh!

morvenhuang commented Mar 31, 2022

Uh oh!

AmplabJenkins commented Mar 31, 2022

Uh oh!

morvenhuang commented Apr 1, 2022

Uh oh!

HyukjinKwon commented Apr 1, 2022

Uh oh!

morvenhuang commented Apr 1, 2022

Uh oh!

HyukjinKwon commented Apr 3, 2022

Uh oh!

morvenhuang commented Apr 6, 2022 •

edited

Loading

Uh oh!

morvenhuang commented Apr 7, 2022

Uh oh!

dtenedor commented Apr 7, 2022

Uh oh!

morvenhuang commented Apr 8, 2022

Uh oh!

dtenedor commented Apr 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-38707][SQL] Allow user to insert into only certain columns of a table #36020

[SPARK-38707][SQL] Allow user to insert into only certain columns of a table #36020

Uh oh!

Conversation

morvenhuang commented Mar 31, 2022

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

AmplabJenkins commented Mar 31, 2022

Uh oh!

morvenhuang commented Apr 1, 2022

Uh oh!

HyukjinKwon commented Apr 1, 2022

Uh oh!

morvenhuang commented Apr 1, 2022

Uh oh!

HyukjinKwon commented Apr 3, 2022

Uh oh!

morvenhuang commented Apr 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

morvenhuang commented Apr 7, 2022

Uh oh!

dtenedor commented Apr 7, 2022

Uh oh!

morvenhuang commented Apr 8, 2022

Uh oh!

dtenedor commented Apr 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

morvenhuang commented Apr 6, 2022 •

edited

Loading