-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16034][SQL] Checks the partition columns when calling dataFrame.write.mode("append").saveAsTable #13749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9ac949c to
f6b0fad
Compare
|
Test build #60741 has finished for PR 13749 at commit
|
|
Test build #60742 has finished for PR 13749 at commit
|
|
Test build #60743 has finished for PR 13749 at commit
|
|
Test build #60745 has finished for PR 13749 at commit
|
|
Test build #60750 has finished for PR 13749 at commit
|
|
Test build #60753 has finished for PR 13749 at commit
|
|
Test build #60754 has finished for PR 13749 at commit
|
| existingColumns.map(_.toLowerCase) == partitionColumns.map(_.toLowerCase) | ||
| if (existingColumns.size > 0 && !sameColumns) { | ||
| throw new AnalysisException( | ||
| s"""Requested partitioning does not match existing partitioning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add "Requested partitioning does not match existing partitioning for table $table" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated
|
Test build #60776 has finished for PR 13749 at commit
|
|
retest this please. |
|
Test build #60783 has finished for PR 13749 at commit
|
| case ex: AnalysisException => | ||
| logError(s"Failed to write to table ${tableIdent.identifier} in $mode mode", ex) | ||
| throw ex | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log entry is mainly for catching the table name and mode, right?
|
LGTM. Let's address the case-sensitivity issue in a separate PR (together with issue found in #13754). I will take care the minor comments (i.e. variable naming). Merging to master and branch 2.0. |
…e.write.mode("append").saveAsTable
## What changes were proposed in this pull request?
`DataFrameWriter` can be used to append data to existing data source tables. It becomes tricky when partition columns used in `DataFrameWriter.partitionBy(columns)` don't match the actual partition columns of the underlying table. This pull request enforces the check so that the partition columns of these two always match.
## How was this patch tested?
Unit test.
Author: Sean Zhong <seanzhong@databricks.com>
Closes #13749 from clockfly/SPARK-16034.
(cherry picked from commit ce3b98b)
Signed-off-by: Yin Huai <yhuai@databricks.com>
| s"$ex != ${partitionColumns.toSet}.") | ||
| } | ||
| val existingColumns = Try { | ||
| resolveRelation() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the returned partitioning columns are user-provided instead of existing dataset's partitioning columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this triggers a partitioning discovery. We should avoid it.
…and improvement ## What changes were proposed in this pull request? This PR is the follow-up PR for https://github.com/apache/spark/pull/13754/files and #13749. I will comment inline to explain my changes. ## How was this patch tested? Existing tests. Author: Yin Huai <yhuai@databricks.com> Closes #13766 from yhuai/caseSensitivity. (cherry picked from commit 6d0f921) Signed-off-by: Yin Huai <yhuai@databricks.com>
…and improvement ## What changes were proposed in this pull request? This PR is the follow-up PR for https://github.com/apache/spark/pull/13754/files and #13749. I will comment inline to explain my changes. ## How was this patch tested? Existing tests. Author: Yin Huai <yhuai@databricks.com> Closes #13766 from yhuai/caseSensitivity.
What changes were proposed in this pull request?
DataFrameWritercan be used to append data to existing data source tables. It becomes tricky when partition columns used inDataFrameWriter.partitionBy(columns)don't match the actual partition columns of the underlying table. This pull request enforces the check so that the partition columns of these two always match.How was this patch tested?
Unit test.