-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16036][SPARK-16037][SQL] fix various table insertion problems #13754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #60759 has finished for PR 13754 at commit
|
| queryString.replace("../../data", testDataPath)) | ||
| val containsCommands = originalQuery.analyzed.collectFirst { | ||
| case _: Command => () | ||
| case _: InsertIntoTable => () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not have InsertIntoTable inside plan tree when run hive query, looks like this PR breaks something, need some more time to investigate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, it's because I removed the PreInsertionCasts rule, which turns InsertIntoTable to InsertIntoHiveTable. This conversion doesn't matter, as hive planner will plan InsertIntoTable into physical InsertIntoHiveTable.
So adding a case here is a reasonable fix.
| } | ||
| } | ||
|
|
||
| test("SPARK-3810: PreInsertionCasts dynamic partitioning support") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This rule is removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, we have a new rule to replace this, right? Seems it will be good to have the tests.
|
Test build #60772 has finished for PR 13754 at commit
|
|
Test build #60770 has finished for PR 13754 at commit
|
|
Test build #60774 has finished for PR 13754 at commit
|
| .partition(a => partition.keySet.contains(a.name)) | ||
| Some(dataColumns ++ partitionColumns.takeRight(numDynamicPartitions)) | ||
| val staticPartCols = partition.filter(_._2.isDefined).keySet | ||
| Some(table.output.filterNot(a => staticPartCols.contains(a.name))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this contains does not work for case-insensitive resolution. We can fix is in a separate PR.
|
Overall LGTM. I am merging this to master and branch 2.0. I will take care those comments in my PR. |
## What changes were proposed in this pull request? The current table insertion has some weird behaviours: 1. inserting into a partitioned table with mismatch columns has confusing error message for hive table, and wrong result for datasource table 2. inserting into a partitioned table without partition list has wrong result for hive table. This PR fixes these 2 problems. ## How was this patch tested? new test in hive `SQLQuerySuite` Author: Wenchen Fan <wenchen@databricks.com> Closes #13754 from cloud-fan/insert2. (cherry picked from commit 3d010c8) Signed-off-by: Yin Huai <yhuai@databricks.com>
…ases for by position resolution ## What changes were proposed in this pull request? This PR migrates some test cases introduced in apache#12313 as a follow-up of apache#13754 and apache#13766. These test cases cover `DataFrameWriter.insertInto()`, while the former two only cover SQL `INSERT` statements. Note that the `testPartitionedTable` utility method tests both Hive SerDe tables and data source tables. ## How was this patch tested? N/A Author: Cheng Lian <lian@databricks.com> Closes apache#13810 from liancheng/spark-16037-follow-up-tests.
…ases for by position resolution ## What changes were proposed in this pull request? This PR migrates some test cases introduced in #12313 as a follow-up of #13754 and #13766. These test cases cover `DataFrameWriter.insertInto()`, while the former two only cover SQL `INSERT` statements. Note that the `testPartitionedTable` utility method tests both Hive SerDe tables and data source tables. ## How was this patch tested? N/A Author: Cheng Lian <lian@databricks.com> Closes #13810 from liancheng/spark-16037-follow-up-tests. (cherry picked from commit f4a3d45) Signed-off-by: Yin Huai <yhuai@databricks.com>
What changes were proposed in this pull request?
The current table insertion has some weird behaviours:
This PR fixes these 2 problems.
How was this patch tested?
new test in hive
SQLQuerySuite