-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24860][SQL] Support setting of partitionOverWriteMode in output options for writing DataFrame #21818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #93290 has finished for PR 21818 at commit
|
|
Test build #93295 has finished for PR 21818 at commit
|
| val enableDynamicOverwrite = parameters.get("partitionOverwriteMode") | ||
| .map(mode => PartitionOverwriteMode.withName(mode.toUpperCase)) | ||
| .getOrElse(sparkSession.sessionState.conf.partitionOverwriteMode) == | ||
| PartitionOverwriteMode.DYNAMIC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is too long
val partitionOverwriteMode = parameters.get("partitionOverwriteMode")....
val enableDynamicOverwrite = partitionOverwriteMode == PartitionOverwriteMode.DYNAMIC
| } | ||
| } | ||
| } | ||
| test("SPARK-24860: dynamic partition overwrite specified per source without catalog table") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a blank file before this new test case
| } | ||
| test("SPARK-24860: dynamic partition overwrite specified per source without catalog table") { | ||
| withTempPath { path => | ||
| Seq((1, 1, 1)).toDF("i", "part1", "part2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we simplify the test? ideally we only need one partition column, and write some initial data to the table. then do an overwrite with partitionOverwriteMode=dynamic, and another overwrite with partitionOverwriteMode=static.
|
Test build #93349 has finished for PR 21818 at commit
|
|
retest this please |
| "mode to keep the same behavior of Spark prior to 2.3. Note that this config doesn't " + | ||
| "affect Hive serde tables, as they are always overwritten with dynamic mode.") | ||
| "affect Hive serde tables, as they are always overwritten with dynamic mode. This can " + | ||
| "also be set as an output option for a data source using key partitionOverwriteMode, " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also need to explain the precedence between the option and sqlconf.
|
Test build #93530 has finished for PR 21818 at commit
|
|
Test build #93550 has finished for PR 21818 at commit
|
|
LGTM Thanks! Merged to master |
What changes were proposed in this pull request?
Besides spark setting spark.sql.sources.partitionOverwriteMode also allow setting partitionOverWriteMode per write
How was this patch tested?
Added unit test in InsertSuite
Please review http://spark.apache.org/contributing.html before opening a pull request.