[SPARK-16402] [SQL] JDBC Source: Implement save API of DataFrameWriter #14077

gatorsmile · 2016-07-06T21:28:30Z

What changes were proposed in this pull request?

Before this PR, we are unable to call the save API of DataFrameWriter when the source is JDBC. For example,

df.write
  .format("jdbc")
  .option("url", url1)
  .option("dbtable", "TEST.TRUNCATETEST")
  .option("user", "testUser")
  .option("password", "testPass")
  .save()

The error message users will get is like

org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider does not allow create table as select.
java.lang.RuntimeException: org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider does not allow create table as select.

At the same time, users can do it for the other data sources, like parquet

This PR is to implement createRelation of CreatableRelationProvider. After the changes, we can use save API of DataFrameWriter.

Closes #12601

How was this patch tested?

Added test cases

dongjoon-hyun · 2016-07-06T22:06:49Z

Hi, @gatorsmile . Sorry for bothering you. :)
May I ask a question? I'm a little bit confused because I made a PR for JDBCWriteSuite.scala before.
For me, it seemed to work for H2 in-memory DB through JDBC.
According to this PR, you mean that was not working until now, don't you?

gatorsmile · 2016-07-06T22:11:41Z

@dongjoon-hyun I assume what you are saying is about the insertInto API. Here, this PR is to implement the save API.

SparkQA · 2016-07-06T23:26:56Z

Test build #61867 has finished for PR 14077 at commit 50c9de8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2016-07-07T00:31:13Z

I guess this is a duplicate of #12601. Maybe we should fix the title and add Closes #12601 in the PR description (I think this one is cleaner than the original one anyway).

gatorsmile · 2016-07-07T01:13:43Z

@HyukjinKwon uh... I did not realize there exists such a PR. I think the implementation of this PR is much simple. We can close #12601 after this is merged. Thanks!

JustinPihony · 2016-07-07T01:28:41Z

This may seem simpler, but that's because it seems to be taking some shortcuts to avoid having to refactor. This currently creates a cycle along the lines of df.save.df.jdbc. Wouldn't it be better to fix the code than to work around it? Additionally, is moving the copy appropriate? Maybe it was put on the outside errantly and it is correct, but I'm not sure it can be moved without researching it properly.

gatorsmile · 2016-07-07T03:51:57Z

@JustinPihony Thanks for your review! Let me try to answer your concerns.

The copy function location is actually a bug. See another PR: [SPARK-16401] [SQL] Data Source API: Enable Extending RelationProvider and CreatableRelationProvider without Extending SchemaRelationProvider #14075.
The trait CreatableRelationProvider was introduced for the save API.

JustinPihony · 2016-07-07T04:20:46Z

@gatorsmile If copy is a bug, then that is fine with me (I just commented my findings on this and will be curious to hear back). That said, it would make my implementation simpler. I'd be fine with simplifying it down to a basic save, however I am still not OK with the circular reference. It adds confusion to debugging and future refactorings. And to fix that, you end up back at my PR, which results in this being a duplicate.

gatorsmile · 2016-07-07T04:51:04Z

@JustinPihony Thank you for confirming that it is a bug in another PR.

Regarding the solution of this PR, it is not a true circular reference. The solution in this PR is to minimize the duplicate codes. I also think it make senses to move the common code logics from jdbc API to createRelation implementation of CreatableRelationProvider. The JDBC-specific logics should not be exposed to the DataFrameWriter APIs.

If you wants to do it in your PR, I am also fine. Please minimize the code changes and add the test cases introduced in this PR. Thanks!

HyukjinKwon · 2016-07-07T05:26:23Z

(Personally, I hope this does not get delayed because this usage was shown in Spark Summit PPT and I guess users would try to use this API.)

JustinPihony · 2016-07-07T05:44:19Z

Then the best course of action would be to use my current impl as it works no matter the position of copy. I can add the additional tests if that would make it more amenable? Otherwise I'll push a reduced code set in the morning, but it would rely on the copy location move PR

On Jul 7, 2016, at 1:27 AM, Hyukjin Kwon notifications@github.com wrote:

(Personally, I hope this does not get delayed because this usage was introduced in Spark Summit PPT and I guess users would try to use this API.)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

gatorsmile · 2016-07-07T05:58:17Z

@JustinPihony You know, I do not care which PR is merged eventually. You can try to clean your PR at your best. I will review your PR when it is ready. Thanks for your work! Please continue to submit more PRs for improving Spark.

To reduce the code changes in your PR, I think we should not extend SchemaRelationProvider. Now, I think you can assume the copy location has been fixed.

Since this is related to Data Source APIs, CC @rxin @yhuai

JustinPihony · 2016-07-07T14:57:03Z

Thanks. I will have to wait until SPARK-16401 is resolved or else the code will not pass tests, though. I also pinged Reynold in JIRA since he had suggested to implement the CreatableRelationProvider...however that was due to the regression.

gatorsmile · 2016-07-07T15:57:10Z

@JustinPihony How about you first moving the copy function in your PR now? Then, we can review your PR before the SPARK-16401 is resolved.

JustinPihony · 2016-07-07T17:26:37Z

@gatorsmile As I said above, I actually think it might be better to keep the work that was already done and am waiting for Reynold's feedback.

SparkQA · 2016-09-01T22:59:44Z

Test build #64805 has finished for PR 14077 at commit 2e799ce.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-09-01T23:42:34Z

Test build #64806 has finished for PR 14077 at commit 07e3168.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2016-09-08T06:10:44Z

@gatorsmile Just a reminder that we might be able to close this.

gatorsmile · 2016-09-08T06:14:46Z

Sure, let me close it now

gatorsmile added 3 commits July 6, 2016 12:36

jdbc.

732e592

fix

1b4db1a

revert

50c9de8

gatorsmile changed the title ~~[SPARK-16402] [SQL] JDBC Source: Implement save API~~ [SPARK-16402] [SQL] JDBC Source: Implement save API of DataFrameWriter Jul 7, 2016

gatorsmile mentioned this pull request Jul 7, 2016

[SPARK-14525][SQL] Make DataFrameWrite.save work for jdbc #12601

Closed

gatorsmile added 3 commits September 1, 2016 13:05

Merge remote-tracking branch 'upstream/master' into jdbcSave

af3533e

address comments.

2e799ce

clean code

07e3168

gatorsmile mentioned this pull request Sep 2, 2016

[SPARK-17361][SQL] file-based external table without path should not be created #14921

Closed

gatorsmile closed this Sep 8, 2016

[SPARK-16402] [SQL] JDBC Source: Implement save API of DataFrameWriter #14077

[SPARK-16402] [SQL] JDBC Source: Implement save API of DataFrameWriter #14077

Uh oh!

Conversation

gatorsmile commented Jul 6, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

dongjoon-hyun commented Jul 6, 2016

Uh oh!

gatorsmile commented Jul 6, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Jul 6, 2016

Uh oh!

HyukjinKwon commented Jul 7, 2016

Uh oh!

gatorsmile commented Jul 7, 2016

Uh oh!

JustinPihony commented Jul 7, 2016

Uh oh!

gatorsmile commented Jul 7, 2016

Uh oh!

JustinPihony commented Jul 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gatorsmile commented Jul 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Jul 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JustinPihony commented Jul 7, 2016

Uh oh!

gatorsmile commented Jul 7, 2016

Uh oh!

JustinPihony commented Jul 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gatorsmile commented Jul 7, 2016

Uh oh!

JustinPihony commented Jul 7, 2016

Uh oh!

SparkQA commented Sep 1, 2016

Uh oh!

SparkQA commented Sep 1, 2016

Uh oh!

HyukjinKwon commented Sep 8, 2016

Uh oh!

gatorsmile commented Sep 8, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gatorsmile commented Jul 6, 2016 •

edited

Loading

gatorsmile commented Jul 6, 2016 •

edited

Loading

JustinPihony commented Jul 7, 2016 •

edited

Loading

gatorsmile commented Jul 7, 2016 •

edited

Loading

HyukjinKwon commented Jul 7, 2016 •

edited

Loading

JustinPihony commented Jul 7, 2016 •

edited

Loading