-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16463][SQL] Support truncate option in Overwrite mode for JDBC DataFrameWriter
#14086
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
8f6224c
25decbb
2f8cbe3
15280c3
986d2a6
e989b3e
8b452cb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -40,6 +40,14 @@ class JDBCWriteSuite extends SharedSQLContext with BeforeAndAfter { | |
| properties.setProperty("password", "testPass") | ||
| properties.setProperty("rowId", "false") | ||
|
|
||
| val testH2Dialect = new JdbcDialect { | ||
| override def canHandle(url: String) : Boolean = url.startsWith("jdbc:h2") | ||
| override def getCatalystType( | ||
| sqlType: Int, typeName: String, size: Int, md: MetadataBuilder): Option[DataType] = | ||
| Some(StringType) | ||
| override def isCascadingTruncateTable(): Option[Boolean] = Some(false) | ||
| } | ||
|
|
||
| before { | ||
| Utils.classForName("org.h2.Driver") | ||
| conn = DriverManager.getConnection(url) | ||
|
|
@@ -145,14 +153,25 @@ class JDBCWriteSuite extends SharedSQLContext with BeforeAndAfter { | |
| assert(2 === spark.read.jdbc(url, "TEST.APPENDTEST", new Properties()).collect()(0).length) | ||
| } | ||
|
|
||
| test("CREATE then INSERT to truncate") { | ||
| test("Truncate") { | ||
| JdbcDialects.registerDialect(testH2Dialect) | ||
| val df = spark.createDataFrame(sparkContext.parallelize(arr2x2), schema2) | ||
| val df2 = spark.createDataFrame(sparkContext.parallelize(arr1x2), schema2) | ||
| val df3 = spark.createDataFrame(sparkContext.parallelize(arr2x3), schema3) | ||
|
|
||
| df.write.jdbc(url1, "TEST.TRUNCATETEST", properties) | ||
| df2.write.mode(SaveMode.Overwrite).jdbc(url1, "TEST.TRUNCATETEST", properties) | ||
| df2.write.mode(SaveMode.Overwrite).option("truncate", true) | ||
| .jdbc(url1, "TEST.TRUNCATETEST", properties) | ||
| assert(1 === spark.read.jdbc(url1, "TEST.TRUNCATETEST", properties).count()) | ||
| assert(2 === spark.read.jdbc(url1, "TEST.TRUNCATETEST", properties).collect()(0).length) | ||
|
|
||
| val m = intercept[SparkException] { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To check my understanding here, this overwrites the table with a different schema (new column I guess it would be nice to test the case where the truncate works at least, though, we can't really test whether it truncates vs drops. Could you for example just repeat the code on line 163-166 here to verify that overwriting just results in the same results?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, that would be better.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, nevermind my last comment. You test the truncate succeeds path already. |
||
| df3.write.mode(SaveMode.Overwrite).option("truncate", true) | ||
| .jdbc(url1, "TEST.TRUNCATETEST", properties) | ||
| }.getMessage | ||
| assert(m.contains("Column \"seq\" not found")) | ||
| assert(0 === spark.read.jdbc(url1, "TEST.TRUNCATETEST", properties).count()) | ||
| JdbcDialects.unregisterDialect(testH2Dialect) | ||
| } | ||
|
|
||
| test("Incompatible INSERT to append") { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
truncateTablefailed due to a non fatal exception, should we fall back to the previous way (i.e., drop and create)? This is a design decision. CC @srowen @rxinThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say no, because user has explicitly specified truncate. They can turn if off themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: ) Sure. Then, the next design question to @rxin and @srowen
Should we still truncate the table if the table schema does not match the schema of new table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do whatever we do with drop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Then, the current implementation looks good to me.
@dongjoon-hyun Could you summarize the previous discussion and design decision we made? Document them in the PR description. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding, I will ask one question.
Literally, we should not do whatever we do with drop, e.g., we should not drop INDEX, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I'll update the document and PR description more clearly.
Thank you for guidance, @rxin and @gatorsmile .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is OK to raise an exception here, but check whether the exception message is meaningful or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, dropping index does not make sense here.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current exception message is "Column
xxxnot found".