[SPARK-16036][SPARK-16037][SQL] fix various table insertion problems #13754

cloud-fan · 2016-06-18T04:57:48Z

What changes were proposed in this pull request?

The current table insertion has some weird behaviours:

inserting into a partitioned table with mismatch columns has confusing error message for hive table, and wrong result for datasource table
inserting into a partitioned table without partition list has wrong result for hive table.

This PR fixes these 2 problems.

How was this patch tested?

new test in hive SQLQuerySuite

cloud-fan · 2016-06-18T04:58:08Z

cc @yhuai @marmbrus @rxin

SparkQA · 2016-06-18T06:17:32Z

Test build #60759 has finished for PR 13754 at commit 52e67d4.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-06-18T07:16:56Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala

                queryString.replace("../../data", testDataPath))
              val containsCommands = originalQuery.analyzed.collectFirst {
                case _: Command => ()
+                case _: InsertIntoTable => ()


We should not have InsertIntoTable inside plan tree when run hive query, looks like this PR breaks something, need some more time to investigate it.

ah, it's because I removed the PreInsertionCasts rule, which turns InsertIntoTable to InsertIntoHiveTable. This conversion doesn't matter, as hive planner will plan InsertIntoTable into physical InsertIntoHiveTable.

So adding a case here is a reasonable fix.

cloud-fan · 2016-06-18T08:01:01Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala

-    }
-  }
-
-  test("SPARK-3810: PreInsertionCasts dynamic partitioning support") {


This rule is removed

But, we have a new rule to replace this, right? Seems it will be good to have the tests.

SparkQA · 2016-06-18T08:58:13Z

Test build #60772 has finished for PR 13754 at commit ee51757.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-06-18T09:16:00Z

Test build #60770 has finished for PR 13754 at commit 4185484.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2016-06-18T09:48:02Z

Test build #60774 has finished for PR 13754 at commit 9590725.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2016-06-18T16:51:55Z

...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala

-          .partition(a => partition.keySet.contains(a.name))
-      Some(dataColumns ++ partitionColumns.takeRight(numDynamicPartitions))
+      val staticPartCols = partition.filter(_._2.isDefined).keySet
+      Some(table.output.filterNot(a => staticPartCols.contains(a.name)))


Seems this contains does not work for case-insensitive resolution. We can fix is in a separate PR.

yhuai · 2016-06-18T17:30:52Z

Overall LGTM. I am merging this to master and branch 2.0. I will take care those comments in my PR.

## What changes were proposed in this pull request? The current table insertion has some weird behaviours: 1. inserting into a partitioned table with mismatch columns has confusing error message for hive table, and wrong result for datasource table 2. inserting into a partitioned table without partition list has wrong result for hive table. This PR fixes these 2 problems. ## How was this patch tested? new test in hive `SQLQuerySuite` Author: Wenchen Fan <wenchen@databricks.com> Closes #13754 from cloud-fan/insert2. (cherry picked from commit 3d010c8) Signed-off-by: Yin Huai <yhuai@databricks.com>

…ases for by position resolution ## What changes were proposed in this pull request? This PR migrates some test cases introduced in apache#12313 as a follow-up of apache#13754 and apache#13766. These test cases cover `DataFrameWriter.insertInto()`, while the former two only cover SQL `INSERT` statements. Note that the `testPartitionedTable` utility method tests both Hive SerDe tables and data source tables. ## How was this patch tested? N/A Author: Cheng Lian <lian@databricks.com> Closes apache#13810 from liancheng/spark-16037-follow-up-tests.

…ases for by position resolution ## What changes were proposed in this pull request? This PR migrates some test cases introduced in #12313 as a follow-up of #13754 and #13766. These test cases cover `DataFrameWriter.insertInto()`, while the former two only cover SQL `INSERT` statements. Note that the `testPartitionedTable` utility method tests both Hive SerDe tables and data source tables. ## How was this patch tested? N/A Author: Cheng Lian <lian@databricks.com> Closes #13810 from liancheng/spark-16037-follow-up-tests. (cherry picked from commit f4a3d45) Signed-off-by: Yin Huai <yhuai@databricks.com>

fix table insertion semantics

52e67d4

cloud-fan changed the title ~~[SPARK-16036][SPARK-16037][SQL] fix various table insertion semantics~~ [SPARK-16036][SPARK-16037][SQL] fix various table insertion problems Jun 18, 2016

fix tests

4185484

cloud-fan reviewed Jun 18, 2016
View reviewed changes

cloud-fan added 2 commits June 18, 2016 00:43

Merge remote-tracking branch 'origin/master' into insert2

ee51757

remove 2 tests

9590725

cloud-fan reviewed Jun 18, 2016
View reviewed changes

yhuai reviewed Jun 18, 2016
View reviewed changes

asfgit closed this in 3d010c8 Jun 18, 2016

yhuai mentioned this pull request Jun 18, 2016

[SPARK-16034][SQL] Checks the partition columns when calling dataFrame.write.mode("append").saveAsTable #13749

Closed

liancheng mentioned this pull request Jun 21, 2016

[SPARK-16037][SQL] Follow-up: add DataFrameWriter.insertInto() test cases for by position resolution #13810

Closed

gatorsmile mentioned this pull request Jul 4, 2016

[SPARK-16358] [SQL] Remove InsertIntoHiveTable From Logical Plan #14037

Closed

JoshRosen mentioned this pull request Sep 7, 2016

[SPARK-15667][SQL]Throw exception if columns number of outputs mismatch the inputs #13409

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-16036][SPARK-16037][SQL] fix various table insertion problems #13754

[SPARK-16036][SPARK-16037][SQL] fix various table insertion problems #13754

Uh oh!

cloud-fan commented Jun 18, 2016

Uh oh!

cloud-fan commented Jun 18, 2016

Uh oh!

SparkQA commented Jun 18, 2016

Uh oh!

cloud-fan Jun 18, 2016

Uh oh!

cloud-fan Jun 18, 2016

Uh oh!

cloud-fan Jun 18, 2016

Uh oh!

yhuai Jun 18, 2016

Uh oh!

SparkQA commented Jun 18, 2016

Uh oh!

SparkQA commented Jun 18, 2016

Uh oh!

SparkQA commented Jun 18, 2016

Uh oh!

yhuai Jun 18, 2016

Uh oh!

yhuai commented Jun 18, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-16036][SPARK-16037][SQL] fix various table insertion problems #13754

[SPARK-16036][SPARK-16037][SQL] fix various table insertion problems #13754

Uh oh!

Conversation

cloud-fan commented Jun 18, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

cloud-fan commented Jun 18, 2016

Uh oh!

SparkQA commented Jun 18, 2016

Uh oh!

cloud-fan Jun 18, 2016

Choose a reason for hiding this comment

Uh oh!

cloud-fan Jun 18, 2016

Choose a reason for hiding this comment

Uh oh!

cloud-fan Jun 18, 2016

Choose a reason for hiding this comment

Uh oh!

yhuai Jun 18, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 18, 2016

Uh oh!

SparkQA commented Jun 18, 2016

Uh oh!

SparkQA commented Jun 18, 2016

Uh oh!

yhuai Jun 18, 2016

Choose a reason for hiding this comment

Uh oh!

yhuai commented Jun 18, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants