Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

The current table insertion has some weird behaviours:

  1. inserting into a partitioned table with mismatch columns has confusing error message for hive table, and wrong result for datasource table
  2. inserting into a partitioned table without partition list has wrong result for hive table.

This PR fixes these 2 problems.

How was this patch tested?

new test in hive SQLQuerySuite

@cloud-fan
Copy link
Contributor Author

cc @yhuai @marmbrus @rxin

@cloud-fan cloud-fan changed the title [SPARK-16036][SPARK-16037][SQL] fix various table insertion semantics [SPARK-16036][SPARK-16037][SQL] fix various table insertion problems Jun 18, 2016
@SparkQA
Copy link

SparkQA commented Jun 18, 2016

Test build #60759 has finished for PR 13754 at commit 52e67d4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

queryString.replace("../../data", testDataPath))
val containsCommands = originalQuery.analyzed.collectFirst {
case _: Command => ()
case _: InsertIntoTable => ()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not have InsertIntoTable inside plan tree when run hive query, looks like this PR breaks something, need some more time to investigate it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, it's because I removed the PreInsertionCasts rule, which turns InsertIntoTable to InsertIntoHiveTable. This conversion doesn't matter, as hive planner will plan InsertIntoTable into physical InsertIntoHiveTable.

So adding a case here is a reasonable fix.

}
}

test("SPARK-3810: PreInsertionCasts dynamic partitioning support") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule is removed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, we have a new rule to replace this, right? Seems it will be good to have the tests.

@SparkQA
Copy link

SparkQA commented Jun 18, 2016

Test build #60772 has finished for PR 13754 at commit ee51757.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 18, 2016

Test build #60770 has finished for PR 13754 at commit 4185484.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 18, 2016

Test build #60774 has finished for PR 13754 at commit 9590725.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.partition(a => partition.keySet.contains(a.name))
Some(dataColumns ++ partitionColumns.takeRight(numDynamicPartitions))
val staticPartCols = partition.filter(_._2.isDefined).keySet
Some(table.output.filterNot(a => staticPartCols.contains(a.name)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this contains does not work for case-insensitive resolution. We can fix is in a separate PR.

@yhuai
Copy link
Contributor

yhuai commented Jun 18, 2016

Overall LGTM. I am merging this to master and branch 2.0. I will take care those comments in my PR.

asfgit pushed a commit that referenced this pull request Jun 18, 2016
## What changes were proposed in this pull request?

The current table insertion has some weird behaviours:

1. inserting into a partitioned table with mismatch columns has confusing error message for hive table, and wrong result for datasource table
2. inserting into a partitioned table without partition list has wrong result for hive table.

This PR fixes these 2 problems.

## How was this patch tested?

new test in hive `SQLQuerySuite`

Author: Wenchen Fan <wenchen@databricks.com>

Closes #13754 from cloud-fan/insert2.

(cherry picked from commit 3d010c8)
Signed-off-by: Yin Huai <yhuai@databricks.com>
@asfgit asfgit closed this in 3d010c8 Jun 18, 2016
ghost pushed a commit to dbtsai/spark that referenced this pull request Jun 21, 2016
…ases for by position resolution

## What changes were proposed in this pull request?

This PR migrates some test cases introduced in apache#12313 as a follow-up of apache#13754 and apache#13766. These test cases cover `DataFrameWriter.insertInto()`, while the former two only cover SQL `INSERT` statements.

Note that the `testPartitionedTable` utility method tests both Hive SerDe tables and data source tables.

## How was this patch tested?

N/A

Author: Cheng Lian <lian@databricks.com>

Closes apache#13810 from liancheng/spark-16037-follow-up-tests.
asfgit pushed a commit that referenced this pull request Jun 21, 2016
…ases for by position resolution

## What changes were proposed in this pull request?

This PR migrates some test cases introduced in #12313 as a follow-up of #13754 and #13766. These test cases cover `DataFrameWriter.insertInto()`, while the former two only cover SQL `INSERT` statements.

Note that the `testPartitionedTable` utility method tests both Hive SerDe tables and data source tables.

## How was this patch tested?

N/A

Author: Cheng Lian <lian@databricks.com>

Closes #13810 from liancheng/spark-16037-follow-up-tests.

(cherry picked from commit f4a3d45)
Signed-off-by: Yin Huai <yhuai@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants