-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28554][SQL] implement basic catalog functionalities for JDBC v2 with a DS v1 fallback API #25291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| InsertIntoDataSourceCommand(fallbackToV1Relation(t, output), query, overwrite = false) | ||
|
|
||
| case OverwriteByExpression( | ||
| DataSourceV2Relation(t: DataSourceV1Table, output, _), Literal(true, _), query, false) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Data Source V1 API don't support static partition overwrite (e.g. INSERT OVERWRITE t PARTITION(a=1) SELECT ...). We must fully migrate JDBC to Data Source V2 before supporting it.
Here we only need to match the OverwriteByExpression(..., deleteExpr=Literal(true)), as this is the only thing DS v1 supports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe also add a @TODO here?
|
Test build #108333 has finished for PR 25291 at commit
|
|
Test build #108334 has finished for PR 25291 at commit
|
|
@cloud-fan @brkyvz @rdblue @gengliangwang Thanks for adding me here. I understand that #25211 is still relevant and i should continue that work (as of now it implements a basic read/write for JDBC V2). Can you please review #25211 and let me know if i am on the right path or should i change something in view of #25291. |
|
IIUC #25211 is invisible to end-users: the way to use JDBC data source is still the same. But it does have values in the long term: code clean up, more optimization opportunity, performance, etc. This PR is to allow users to use JDBC source in the data source v2 way, i.e. register custom catalog and create/access tables directly. It only implements the basic catalog functionalities. In general, we need both PRs but this one is smaller and visible to end-users, which probably should be reviewed first. |
|
Test build #108340 has finished for PR 25291 at commit
|
|
Test build #108379 has finished for PR 25291 at commit
|
|
Test build #108387 has finished for PR 25291 at commit
|
|
@cloud-fan, why does this convert from v2 plans to v1 plans? I don't think that v1 plans should be supported. To use v1 plans, the existing v1 code paths and support should be used. There's no need to mix v2 into it. The support that I think is the goal of this effort is to provide a v2 implementation that calls a v1 writer for append and read operations. I suggest clarifying the goal of this and possibly discussing it in the next v2 sync. |
|
@rdblue The biggest issue is that V1 writers take logical plans and V2 writers take SparkPlans (which is much better IMO, because then you can actually see the executed query in the SQL Tab) I can see us adding Having separate physical nodes seems reasonable to me, but it is certainly more work (mostly copy-paste though) than adding LogicalPlans to V2 nodes. What do you think? |
|
@brkyvz, I don't understand. I thought the idea was not to use v1 plans, but to use the v1 read/write API. Does the API require using a v1 plan? |
|
The API doesn't require V1 plans. V1 writers just need LogicalPlans as inputs rather than SparkPlans |
|
@brkyvz, then why does this add analyzer rules that convert v2 plans to v1 plans? If we just need to pass a plan to the source, then we don't need to convert the whole plan back to v1 right? |
|
I wasn't talking about the current state of the PR. We don't need these analyzer rewrites to rewrite to V1 plans. |
|
FYI, I made an alternative proposal in #25348 |
|
Note that, this PR extends the data source v1 API, to make it support part of the v2 functionalities (catalog APIs). As a result, this PR only touches the DS v1 code path. The only exception is in v2 strategy. The table creation is v2 API and table inserting is v1 API, so I have to create a special physical plan that calls both v1 and v2 APIs. #25348 goes in a different direction. It extends the DS v2 API, to make it support some v1 functionalities (write a logical plan directly). As a result, it only touches the DS v2 code path. It updates the v2 physical plan to include the input logical in order to call v1 APIs. I don't have a strong preference here. I picked the first direction because DS v1 is in sql/core which can access DS v2 directly. If we all agree that the second direction is better, I'm happy to go with that too. |
|
@cloud-fan, can you be more specific about what you are proposing for the v1 update and separate it from JDBC? Mixing those concerns together doesn't seem like a good path forward to me. |
|
I put JDBC v2 here because I want to give a concrete example and prove that my proposal works. I can surely create a new PR for API changes only, as soon as we agree on the approach. More specifically, JDBC v2 wants to
The proposal here is:
How it works:
|
CatalogTableAsV2 wraps a
Any catalog can return a table instance with this type, if it is public. Why does this require changing the TableCatalog interface?
I don't think v2 plans should be converted to v1 plans. As we discussed in the last sync, v1 read and write interfaces can be called from v2, but v2 should always provide v2 behavior guarantees. Converting back to v1 plans is not the right way to add compatibility. |
It's nothing about
Because the CTAS plan does not hold a |
|
I think the approach in #25348 is the right way to move forward because it calls the v1 API only when we can guarantee that the behavior is the same. |
|
|
||
| /** | ||
| * A special Data Source V2 `Table`, which doesn't need to implement the read/write capabilities. | ||
| * Spark will fallback the read/write requests to the v1 relation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consistent with upper case for wording V1 and V2?
| _name = name | ||
| val map = options.asCaseSensitiveMap().asScala.toMap | ||
| // The `JDBCOptions` checks the existence of the table option. This is required by JDBC v1, but | ||
| // JDBC V2 only knows the table option when loading a table. Here we put a table option with a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same upper case V1 and V2?
|
@cloud-fan, do you want to close this since #25348 was merged? |
|
This PR is for JDBC v2. I'll continue work on it after the v1 read fallback API is done. |
|
We're closing this PR because it hasn't been updated in a while. If you'd like to revive this PR, please reopen it! |
What changes were proposed in this pull request?
This PR implements the basic catalog functionalities for JDBC v2, so that users can register a JDBC catalog and create/access JDBC tables directly.
To make the PR small, this PR introduces a data source v1 fallback API so that we don't need to implement
ScanBuilderandWriteBuilderfor JDBC v1 right now.Note that, this PR is orthogonal to #25211, which implements
ScanBuilder,WriterBuilderfor JDBC v2.How was this patch tested?
a new test suite.