Skip to content

Conversation

@clockfly
Copy link
Contributor

@clockfly clockfly commented Jun 18, 2016

What changes were proposed in this pull request?

DataFrameWriter can be used to append data to existing data source tables. It becomes tricky when partition columns used in DataFrameWriter.partitionBy(columns) don't match the actual partition columns of the underlying table. This pull request enforces the check so that the partition columns of these two always match.

How was this patch tested?

Unit test.

@clockfly clockfly force-pushed the SPARK-16034 branch 4 times, most recently from 9ac949c to f6b0fad Compare June 18, 2016 01:19
@SparkQA
Copy link

SparkQA commented Jun 18, 2016

Test build #60741 has finished for PR 13749 at commit 44a22dd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 18, 2016

Test build #60742 has finished for PR 13749 at commit c6a7773.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 18, 2016

Test build #60743 has finished for PR 13749 at commit 8bacffb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 18, 2016

Test build #60745 has finished for PR 13749 at commit f6b0fad.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 18, 2016

Test build #60750 has finished for PR 13749 at commit 72fdeaf.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@clockfly clockfly changed the title [SPARK-16034][SQL][WIP] Checks the partition columns when calling dataFrame.write.mode("append").saveAsTable [SPARK-16034][SQL] Checks the partition columns when calling dataFrame.write.mode("append").saveAsTable Jun 18, 2016
@SparkQA
Copy link

SparkQA commented Jun 18, 2016

Test build #60753 has finished for PR 13749 at commit 5224802.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 18, 2016

Test build #60754 has finished for PR 13749 at commit 7a4293b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

existingColumns.map(_.toLowerCase) == partitionColumns.map(_.toLowerCase)
if (existingColumns.size > 0 && !sameColumns) {
throw new AnalysisException(
s"""Requested partitioning does not match existing partitioning.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add "Requested partitioning does not match existing partitioning for table $table" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated

@SparkQA
Copy link

SparkQA commented Jun 18, 2016

Test build #60776 has finished for PR 13749 at commit 611545c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@clockfly
Copy link
Contributor Author

retest this please.

@SparkQA
Copy link

SparkQA commented Jun 18, 2016

Test build #60783 has finished for PR 13749 at commit 611545c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

case ex: AnalysisException =>
logError(s"Failed to write to table ${tableIdent.identifier} in $mode mode", ex)
throw ex
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log entry is mainly for catching the table name and mode, right?

@yhuai
Copy link
Contributor

yhuai commented Jun 18, 2016

LGTM. Let's address the case-sensitivity issue in a separate PR (together with issue found in #13754). I will take care the minor comments (i.e. variable naming).

Merging to master and branch 2.0.

asfgit pushed a commit that referenced this pull request Jun 18, 2016
…e.write.mode("append").saveAsTable

## What changes were proposed in this pull request?

`DataFrameWriter` can be used to append data to existing data source tables. It becomes tricky when partition columns used in `DataFrameWriter.partitionBy(columns)` don't match the actual partition columns of the underlying table. This pull request enforces the check so that the partition columns of these two always match.

## How was this patch tested?

Unit test.

Author: Sean Zhong <seanzhong@databricks.com>

Closes #13749 from clockfly/SPARK-16034.

(cherry picked from commit ce3b98b)
Signed-off-by: Yin Huai <yhuai@databricks.com>
@asfgit asfgit closed this in ce3b98b Jun 18, 2016
s"$ex != ${partitionColumns.toSet}.")
}
val existingColumns = Try {
resolveRelation()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the returned partitioning columns are user-provided instead of existing dataset's partitioning columns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this triggers a partitioning discovery. We should avoid it.

asfgit pushed a commit that referenced this pull request Jun 20, 2016
…and improvement

## What changes were proposed in this pull request?
This PR is the follow-up PR for https://github.com/apache/spark/pull/13754/files and #13749. I will comment inline to explain my changes.

## How was this patch tested?
Existing tests.

Author: Yin Huai <yhuai@databricks.com>

Closes #13766 from yhuai/caseSensitivity.

(cherry picked from commit 6d0f921)
Signed-off-by: Yin Huai <yhuai@databricks.com>
asfgit pushed a commit that referenced this pull request Jun 20, 2016
…and improvement

## What changes were proposed in this pull request?
This PR is the follow-up PR for https://github.com/apache/spark/pull/13754/files and #13749. I will comment inline to explain my changes.

## How was this patch tested?
Existing tests.

Author: Yin Huai <yhuai@databricks.com>

Closes #13766 from yhuai/caseSensitivity.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants