-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-38707][SQL] Allow user to insert into only certain columns of a table #36020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
There're some failed test, I'm looking into it. |
|
@morvenhuang it would be great to assess this further if you add some references of other DBMSes that support this syntax. Does Hive support this? |
|
@HyukjinKwon Hi Hyukjin, thank you for your time. I believe this is a SQL-92 standard, most RDBMSes I've known support inserting only certain columns of a table, like MySQL/Oracle/MS SQL SERVER/Teradata. And yes, Hive also supports this. |
|
cc @dtenedor FYI. Seems like actually it's already implemented with |
…r change to avoid unnecessary column size check when USE_NULLS_FOR_MISSING_DEFAULT_COLUMN_VALUES is enabled, since all missing default value(s) will be added to query automatically.
|
@HyukjinKwon Hi Hyukjin, thanks for the comment, that implementation is a great job, although there's a tiny issue due to unnecessary column number check, say we have a table t1 with 2 columns c1 int, c2 int, I believe that column number check is not necessary anymore when this option is enabled, since all omitted column(s) will be added back to query automatically during parse, which means I've made a commit to get rid of the check, so that the |
|
@HyukjinKwon Hi Hyukjin, could you please help to verify this? Many thanks. |
|
FYI I also have this relevant PR for developing |
|
@dtenedor that's great, thank you for pointing out, I'm gonna close this one. |
|
@morvenhuang looking forward to working more on Apache Spark with you and the community! |
What changes were proposed in this pull request?
Update the column number check logic for INSERT INTO statement, avoid checking all columns of the table when user has specified a column list in the statement.
Why are the changes needed?
When running INSERT INTO statement, it's quite common that user wants to insert only certain columns, especially when the rest columns have default value.
Currently, spark allows user to specify column list for INSERT INTO statement only when the column list contains all columns of the table. If user does not provided a completed list of column, it will result in an AnalysisException.
This patch allows user to insert into only certain columns of the table, which will help when excuting INSERT INTO, especially when excuting them on RDBMS.
Does this PR introduce any user-facing change?
Yes.
How was this patch tested?
New test case added.