Skip to content

Conversation

@gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented Jul 6, 2016

What changes were proposed in this pull request?

Before this PR, we are unable to call the save API of DataFrameWriter when the source is JDBC. For example,

df.write
  .format("jdbc")
  .option("url", url1)
  .option("dbtable", "TEST.TRUNCATETEST")
  .option("user", "testUser")
  .option("password", "testPass")
  .save() 

The error message users will get is like

org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider does not allow create table as select.
java.lang.RuntimeException: org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider does not allow create table as select.

At the same time, users can do it for the other data sources, like parquet

This PR is to implement createRelation of CreatableRelationProvider. After the changes, we can use save API of DataFrameWriter.

Closes #12601

How was this patch tested?

Added test cases

@dongjoon-hyun
Copy link
Member

Hi, @gatorsmile . Sorry for bothering you. :)
May I ask a question? I'm a little bit confused because I made a PR for JDBCWriteSuite.scala before.
For me, it seemed to work for H2 in-memory DB through JDBC.
According to this PR, you mean that was not working until now, don't you?

@gatorsmile
Copy link
Member Author

gatorsmile commented Jul 6, 2016

@dongjoon-hyun I assume what you are saying is about the insertInto API. Here, this PR is to implement the save API.

@SparkQA
Copy link

SparkQA commented Jul 6, 2016

Test build #61867 has finished for PR 14077 at commit 50c9de8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

I guess this is a duplicate of #12601. Maybe we should fix the title and add Closes #12601 in the PR description (I think this one is cleaner than the original one anyway).

@gatorsmile
Copy link
Member Author

@HyukjinKwon uh... I did not realize there exists such a PR. I think the implementation of this PR is much simple. We can close #12601 after this is merged. Thanks!

@gatorsmile gatorsmile changed the title [SPARK-16402] [SQL] JDBC Source: Implement save API [SPARK-16402] [SQL] JDBC Source: Implement save API of DataFrameWriter Jul 7, 2016
@JustinPihony
Copy link
Contributor

This may seem simpler, but that's because it seems to be taking some shortcuts to avoid having to refactor. This currently creates a cycle along the lines of df.save.df.jdbc. Wouldn't it be better to fix the code than to work around it? Additionally, is moving the copy appropriate? Maybe it was put on the outside errantly and it is correct, but I'm not sure it can be moved without researching it properly.

@gatorsmile
Copy link
Member Author

@JustinPihony Thanks for your review! Let me try to answer your concerns.

@JustinPihony
Copy link
Contributor

JustinPihony commented Jul 7, 2016

@gatorsmile If copy is a bug, then that is fine with me (I just commented my findings on this and will be curious to hear back). That said, it would make my implementation simpler. I'd be fine with simplifying it down to a basic save, however I am still not OK with the circular reference. It adds confusion to debugging and future refactorings. And to fix that, you end up back at my PR, which results in this being a duplicate.

@gatorsmile
Copy link
Member Author

gatorsmile commented Jul 7, 2016

@JustinPihony Thank you for confirming that it is a bug in another PR.

Regarding the solution of this PR, it is not a true circular reference. The solution in this PR is to minimize the duplicate codes. I also think it make senses to move the common code logics from jdbc API to createRelation implementation of CreatableRelationProvider. The JDBC-specific logics should not be exposed to the DataFrameWriter APIs.

If you wants to do it in your PR, I am also fine. Please minimize the code changes and add the test cases introduced in this PR. Thanks!

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jul 7, 2016

(Personally, I hope this does not get delayed because this usage was shown in Spark Summit PPT and I guess users would try to use this API.)

@JustinPihony
Copy link
Contributor

Then the best course of action would be to use my current impl as it works no matter the position of copy. I can add the additional tests if that would make it more amenable? Otherwise I'll push a reduced code set in the morning, but it would rely on the copy location move PR

On Jul 7, 2016, at 1:27 AM, Hyukjin Kwon notifications@github.com wrote:

(Personally, I hope this does not get delayed because this usage was introduced in Spark Summit PPT and I guess users would try to use this API.)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@gatorsmile
Copy link
Member Author

@JustinPihony You know, I do not care which PR is merged eventually. You can try to clean your PR at your best. I will review your PR when it is ready. Thanks for your work! Please continue to submit more PRs for improving Spark.

To reduce the code changes in your PR, I think we should not extend SchemaRelationProvider. Now, I think you can assume the copy location has been fixed.

Since this is related to Data Source APIs, CC @rxin @yhuai

@JustinPihony
Copy link
Contributor

JustinPihony commented Jul 7, 2016

Thanks. I will have to wait until SPARK-16401 is resolved or else the code will not pass tests, though. I also pinged Reynold in JIRA since he had suggested to implement the CreatableRelationProvider...however that was due to the regression.

@gatorsmile
Copy link
Member Author

@JustinPihony How about you first moving the copy function in your PR now? Then, we can review your PR before the SPARK-16401 is resolved.

@JustinPihony
Copy link
Contributor

@gatorsmile As I said above, I actually think it might be better to keep the work that was already done and am waiting for Reynold's feedback.

@SparkQA
Copy link

SparkQA commented Sep 1, 2016

Test build #64805 has finished for PR 14077 at commit 2e799ce.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 1, 2016

Test build #64806 has finished for PR 14077 at commit 07e3168.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

@gatorsmile Just a reminder that we might be able to close this.

@gatorsmile
Copy link
Member Author

Sure, let me close it now

@gatorsmile gatorsmile closed this Sep 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants